Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction
By: Hongjin Kim , Jaewook Lee , Kiyoung Lee and more
Potential Business Impact:
Makes AI understand and solve problems in Korean.
Large Language Models (LLMs) demonstrate strong reasoning and self-correction abilities in high-resource languages like English, but their performance remains limited in low-resource languages such as Korean. In this study, we investigate whether reinforcement learning (RL) can enhance Korean reasoning abilities to a degree comparable to English. Our findings reveal that RL alone yields limited improvements when applied to models lacking inherent Korean reasoning capabilities. To address this, we explore several fine-tuning strategies and show that aligning the model's internal reasoning processes with Korean inputs-particularly by tuning Korean-specific neurons in early layers-is key to unlocking RL's effectiveness. We introduce a self-correction code-switching dataset to facilitate this alignment and observe significant performance gains in both mathematical reasoning and self-correction tasks. Ultimately, we conclude that the crucial factor in multilingual reasoning enhancement is not injecting new linguistic knowledge, but effectively eliciting and aligning existing reasoning capabilities. Our study provides a new perspective on how internal translation and neuron-level tuning contribute to multilingual reasoning alignment in LLMs.
Similar Papers
Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement
Computation and Language
Teaches computers to think better and faster.
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Artificial Intelligence
Makes computers learn new tricks, but not really.
A Survey of Reinforcement Learning for Large Reasoning Models
Computation and Language
Teaches computers to think and solve hard problems.