arXiv:1808.08438 Abstract | arXiv Analytics

arXiv:1808.08438 [cs.CL]Abstract References Reviews Resources

Paraphrases as Foreign Languages in Multilingual Neural Machine Translation

Zhong Zhou, Matthias Sperber, Alex Waibel

Published 2018-08-25Version 1

Using paraphrases, the expression of the same semantic meaning in different words, to improve generalization and translation performance is often useful. However, prior works only explore the use of paraphrases at the word or phrase level, not at the sentence or document level. Unlike previous works, we use different translations of the whole training data that are consistent in structure as paraphrases at the corpus level. Our corpus contains parallel paraphrases in multiple languages from various sources. We treat paraphrases as foreign languages, tag source sentences with paraphrase labels, and train in the style of multilingual Neural Machine Translation (NMT). Experimental results indicate that adding paraphrases improves the rare word translation, increases entropy and diversity in lexical choice. Moreover, adding the source paraphrases improves translation performance more effectively than adding the target paraphrases. Combining both the source and the target paraphrases boosts performance further; combining paraphrases with multilingual data also helps but has mixed performance. We achieve a BLEU score of 57.2 for French-to-English translation, training on 24 paraphrases of the Bible, which is ~+27 above the WMT'14 baseline.

Categories: cs.CL

Keywords: multilingual neural machine translation, foreign languages, target paraphrases boosts performance, corpus contains parallel paraphrases, translation performance

Related articles: Most relevant | Search more

arXiv:2109.06679 [cs.CL] (Published 2021-09-14)

Efficient Inference for Multilingual Neural Machine Translation

Alexandre Berard, Dain Lee, Stéphane Clinchant, Kweonwoo Jung, Vassilina Nikoulina

arXiv:2502.02577 [cs.CL] (Published 2025-02-04)

A comparison of translation performance between DeepL and Supertext

Alex Flückiger, Chantal Amrhein, Tim Graf, Philippe Schläpfer, Florian Schottmann, Samuel Läubli

arXiv:2501.02979 [cs.CL] (Published 2025-01-06)

Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation

Zhi Qu, Yiran Wang, Jiannan Mao, Chenchen Ding, Hideki Tanaka, Masao Utiyama, Taro Watanabe