Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model
By: Cristian García-Romero, Miquel Esplà-Gomis, Felipe Sánchez-Martínez
Potential Business Impact:
Finds fake translations to make language apps better.
Modern machine translation (MT) systems depend on large parallel corpora, often collected from the Internet. However, recent evidence indicates that (i) a substantial portion of these texts are machine-generated translations, and (ii) an overreliance on such synthetic content in training data can significantly degrade translation quality. As a result, filtering out non-human translations is becoming an essential pre-processing step in building high-quality MT systems. In this work, we propose a novel approach that directly exploits the internal representations of a surrogate multilingual MT model to distinguish between human and machine-translated sentences. Experimental results show that our method outperforms current state-of-the-art techniques, particularly for non-English language pairs, achieving gains of at least 5 percentage points of accuracy.
Similar Papers
Testing the Limits of Machine Translation from One Book
Computation and Language
Helps computers translate rare languages better.
Estimating Machine Translation Difficulty
Computation and Language
Finds hard sentences for computer translators.
An Interdisciplinary Approach to Human-Centered Machine Translation
Computation and Language
Makes computer translations more helpful for everyone.