Score: 0

Beyond the Final Layer: Intermediate Representations for Better Multilingual Calibration in Large Language Models

Published: October 3, 2025 | arXiv ID: 2510.03136v1

By: Ej Zhou , Caiqi Zhang , Tiancheng Hu and more

Potential Business Impact:

Makes AI understand other languages better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Confidence calibration, the alignment of a model's predicted confidence with its actual accuracy, is crucial for the reliable deployment of Large Language Models (LLMs). However, this critical property remains largely under-explored in multilingual contexts. In this work, we conduct the first large-scale, systematic studies of multilingual calibration across six model families and over 100 languages, revealing that non-English languages suffer from systematically worse calibration. To diagnose this, we investigate the model's internal representations and find that the final layer, biased by English-centric training, provides a poor signal for multilingual confidence. In contrast, our layer-wise analysis uncovers a key insight that late-intermediate layers consistently offer a more reliable and better-calibrated signal. Building on this, we introduce a suite of training-free methods, including Language-Aware Confidence Ensemble (LACE), which adaptively selects an optimal ensemble of layers for each specific language. Our study highlights the hidden costs of English-centric alignment and offer a new path toward building more globally equitable and trustworthy LLMs by looking beyond the final layer.

Calibration Across Layers: Understanding Calibration Evolution in LLMs

Machine Learning (CS)

Makes AI more honest about what it knows.

31 Oct 2025 1

90%

Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations

Computation and Language

Makes AI answers more trustworthy by checking their consistency.

12 Jan 2026 0

90%

Investigating the Multilingual Calibration Effects of Language Model Instruction-Tuning

Computation and Language

Makes AI understand many languages better.

4 Jan 2026 0

View PDF Login to Bookmark

Country of Origin

🇬🇧 United Kingdom

Page Count

35 pages

Beyond the Final Layer: Intermediate Representations for Better Multilingual Calibration in Large Language Models

Makes AI understand other languages better.

Technical Abstract

Calibration Across Layers: Understanding Calibration Evolution in LLMs

Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations

Investigating the Multilingual Calibration Effects of Language Model Instruction-Tuning