Score: 1

Quantifying Language Disparities in Multilingual Large Language Models

Published: August 23, 2025 | arXiv ID: 2508.17162v1

By: Songbo Hu, Ivan Vulić, Anna Korhonen

Potential Business Impact:

Tests computer language fairness better, especially for rare languages.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Results reported in large-scale multilingual evaluations are often fragmented and confounded by factors such as target languages, differences in experimental setups, and model choices. We propose a framework that disentangles these confounding variables and introduces three interpretable metrics--the performance realisation ratio, its coefficient of variation, and language potential--enabling a finer-grained and more insightful quantification of actual performance disparities across both (i) models and (ii) languages. Through a case study of 13 model variants on 11 multilingual datasets, we demonstrate that our framework provides a more reliable measurement of model performance and language disparities, particularly for low-resource languages, which have so far proven challenging to evaluate. Importantly, our results reveal that higher overall model performance does not necessarily imply greater fairness across languages.

Country of Origin
🇬🇧 United Kingdom

Repos / Data Links

Page Count
16 pages

Category
Computer Science:
Computation and Language