arXiv Analytics

Sign in

arXiv:1808.08438 [cs.CL]AbstractReferencesReviewsResources

Paraphrases as Foreign Languages in Multilingual Neural Machine Translation

Zhong Zhou, Matthias Sperber, Alex Waibel

Published 2018-08-25Version 1

Using paraphrases, the expression of the same semantic meaning in different words, to improve generalization and translation performance is often useful. However, prior works only explore the use of paraphrases at the word or phrase level, not at the sentence or document level. Unlike previous works, we use different translations of the whole training data that are consistent in structure as paraphrases at the corpus level. Our corpus contains parallel paraphrases in multiple languages from various sources. We treat paraphrases as foreign languages, tag source sentences with paraphrase labels, and train in the style of multilingual Neural Machine Translation (NMT). Experimental results indicate that adding paraphrases improves the rare word translation, increases entropy and diversity in lexical choice. Moreover, adding the source paraphrases improves translation performance more effectively than adding the target paraphrases. Combining both the source and the target paraphrases boosts performance further; combining paraphrases with multilingual data also helps but has mixed performance. We achieve a BLEU score of 57.2 for French-to-English translation, training on 24 paraphrases of the Bible, which is ~+27 above the WMT'14 baseline.

Related articles: Most relevant | Search more
arXiv:2109.06679 [cs.CL] (Published 2021-09-14)
Efficient Inference for Multilingual Neural Machine Translation
arXiv:2502.02577 [cs.CL] (Published 2025-02-04)
A comparison of translation performance between DeepL and Supertext
arXiv:2501.02979 [cs.CL] (Published 2025-01-06)
Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation