MTQ-Eval: Multilingual Text Quality Evaluation for Language Models
By: Rhitabrat Pokharel, Ameeta Agrawal
Potential Business Impact:
Helps computers judge good writing in many languages.
The use of large language models (LLMs) for evaluating outputs is becoming an increasingly effective and scalable approach. However, it remains uncertain whether this capability extends beyond task-specific evaluations to more general assessments of text quality, particularly in multilingual contexts. In this study, we introduce, MTQ-Eval, a novel framework for multilingual text quality evaluation that learns from examples of both high- and low-quality texts, adjusting its internal representations. To develop MTQ-Eval, we first automatically generate text quality preference data and then use it to train open-source base LLMs to align with ratings of high- and low-quality text. Our comprehensive evaluation across 115 languages demonstrates the improved performance of the proposed model. Upon further analysis, we find that this enhanced evaluation capability also leads to notable improvements in downstream tasks.
Similar Papers
Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation
Computation and Language
Tests AI language skills better for smarter tools.
LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods
Computation and Language
Makes computer translations much better.
M2G-Eval: Enhancing and Evaluating Multi-granularity Multilingual Code Generation
Computation and Language
Tests how well computers write code in many ways.