Score: 1

Language Matters: How Do Multilingual Input and Reasoning Paths Affect Large Reasoning Models?

Published: May 23, 2025 | arXiv ID: 2505.17407v1

By: Zhi Rui Tam , Cheng-Kuang Wu , Yu Ying Chiu and more

Potential Business Impact:

Computers think in English, not your language.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large reasoning models (LRMs) have demonstrated impressive performance across a range of reasoning tasks, yet little is known about their internal reasoning processes in multilingual settings. We begin with a critical question: {\it In which language do these models reason when solving problems presented in different languages?} Our findings reveal that, despite multilingual training, LRMs tend to default to reasoning in high-resource languages (e.g., English) at test time, regardless of the input language. When constrained to reason in the same language as the input, model performance declines, especially for low-resource languages. In contrast, reasoning in high-resource languages generally preserves performance. We conduct extensive evaluations across reasoning-intensive tasks (MMMLU, MATH-500) and non-reasoning benchmarks (CulturalBench, LMSYS-toxic), showing that the effect of language choice varies by task type: input-language reasoning degrades performance on reasoning tasks but benefits cultural tasks, while safety evaluations exhibit language-specific behavior. By exposing these linguistic biases in LRMs, our work highlights a critical step toward developing more equitable models that serve users across diverse linguistic backgrounds.