Efficient RL Training for Reasoning Models via Length-Aware Optimization
By: Danlong Yuan , Tian Xie , Shaohan Huang and more
Potential Business Impact:
Makes smart computers answer faster, using less effort.
Large reasoning models, such as OpenAI o1 or DeepSeek R1, have demonstrated remarkable performance on reasoning tasks but often incur a long reasoning path with significant memory and time costs. Existing methods primarily aim to shorten reasoning paths by introducing additional training data and stages. In this paper, we propose three critical reward designs integrated directly into the reinforcement learning process of large reasoning models, which reduce the response length without extra training stages. Experiments on four settings show that our method significantly decreases response length while maintaining or even improving performance. Specifically, in a logic reasoning setting, we achieve a 40% reduction in response length averaged by steps alongside a 14% gain in performance. For math problems, we reduce response length averaged by steps by 33% while preserving performance.
Similar Papers
Concise Reasoning via Reinforcement Learning
Computation and Language
Makes AI give shorter, smarter answers.
Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty
Computation and Language
Makes AI think faster and smarter on tests.
Optimizing Length Compression in Large Reasoning Models
Artificial Intelligence
Makes AI think smarter, not longer.