Investigating the Multilingual Calibration Effects of Language Model Instruction-Tuning
By: Jerry Huang , Peng Lu , Qiuhao Zeng and more
Potential Business Impact:
Makes AI understand many languages better.
Ensuring that deep learning models are well-calibrated in terms of their predictive uncertainty is essential in maintaining their trustworthiness and reliability, yet despite increasing advances in foundation model research, the relationship between such large language models (LLMs) and their calibration remains an open area of research. In this work, we look at a critical gap in the calibration of LLMs within multilingual settings, in an attempt to better understand how the data scarcity can potentially lead to different calibration effects and how commonly used techniques can apply in these settings. Our analysis on two multilingual benchmarks, over 29 and 42 languages respectively, reveals that even in low-resource languages, model confidence can increase significantly after instruction-tuning on high-resource language SFT datasets. However, improvements in accuracy are marginal or non-existent, resulting in mis-calibration, highlighting a critical shortcoming of standard SFT for multilingual languages. Furthermore, we observe that the use of label smoothing to be a reasonable method alleviate this concern, again without any need for low-resource SFT data, maintaining better calibration across all languages. Overall, this highlights the importance of multilingual considerations for both training and tuning LLMs in order to improve their reliability and fairness in downstream use.
Similar Papers
Calibrated Language Models and How to Find Them with Label Smoothing
Machine Learning (CS)
Makes AI smarter and more honest.
Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models
Computation and Language
Makes AI more honest about what it knows.
Calibrating Beyond English: Language Diversity for Better Quantized Multilingual LLM
Computation and Language
Makes AI understand many languages better.