Translation Entropy: A Statistical Framework for Evaluating Translation Systems
By: Ronit D. Gross, Yanir Harel, Ido Kanter
Potential Business Impact:
Measures how good computer translators really are.
The translation of written language has been known since the 3rd century BC; however, its necessity has become increasingly common in the information age. Today, many translators exist, based on encoder-decoder deep architectures, nevertheless, no quantitative objective methods are available to assess their performance, likely because the entropy of even a single language remains unknown. This study presents a quantitative method for estimating translation entropy, with the following key finding. Given a translator, several sentences that differ by only one selected token of a given pivot sentence yield identical translations. Analyzing the statistics of this phenomenon across an ensemble of such sentences, consisting each of a pivot selected token, yields the probabilities of replacing this specific token with others while preserving the translation. These probabilities constitute the entropy of the selected token, and the average across all selected pivot tokens provides an estimate of the translator's overall translation entropy, which is enhanced along the decoder blocks. This entropic measure allows for the quantitative ranking of several publicly available translators and reveals whether mutual translation entropy is symmetric. Extending the proposed method to include the replacement of two tokens in a given pivot sentence demonstrates a multiplicative effect, where translation degeneracy is proportional to the product of the degeneracies of the two tokens. These findings establish translation entropy as a measurable property and objective benchmarking of artificial translators. Results are based on MarianMT, T5-Base and NLLB-200 translators.
Similar Papers
Know Your Limits: Entropy Estimation Modeling for Compression and Generalization
Computation and Language
Makes computers understand and write language better.
Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models
Computation and Language
Makes AI know when it's unsure about answers.
Estimating Machine Translation Difficulty
Computation and Language
Finds hard sentences for computer translators.