Improving Retrieval-Augmented Neural Machine Translation with Monolingual Data
By: Maxime Bouthors, Josep Crego, François Yvon
Potential Business Impact:
Improves computer translation using only one language.
Conventional retrieval-augmented neural machine translation (RANMT) systems leverage bilingual corpora, e.g., translation memories (TMs). Yet, in many settings, in-domain monolingual target-side corpora are often available. This work explores ways to take advantage of such resources by retrieving relevant segments directly in the target language, based on a source-side query. For this, we design improved cross-lingual retrieval systems, trained with both sentence level and word-level matching objectives. In our experiments with two RANMT architectures, we first demonstrate the benefits of such cross-lingual objectives in a controlled setting, obtaining translation performances that surpass standard TM-based models. We then showcase our method on a real-world set-up, where the target monolingual resources far exceed the amount of parallel data and observe large improvements of our new techniques, which outperform both the baseline setting, and general-purpose cross-lingual retrievers.
Similar Papers
Evaluation of NMT-Assisted Grammar Transfer for a Multi-Language Configurable Data-to-Text System
Computation and Language
Makes computers write same story in many languages.
Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task
Computation and Language
Helps computers answer questions in any language.
Context-Aware Monolingual Human Evaluation of Machine Translation
Computation and Language
Lets people check translations without the original text.