Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
By: Xuechen Zhang , Zijian Huang , Chenshun Ni and more
Potential Business Impact:
Makes smart computer programs think faster, shorter.
Recent research enhances language model reasoning by scaling test-time compute via longer chain-of-thought traces. This often improves accuracy but also introduces redundancy and high computational cost, especially for small language models distilled with supervised fine-tuning (SFT). In this work, we propose new algorithms to improve token-efficient reasoning with small-scale models by effectively trading off accuracy and computation. We first show that the post-SFT model fails to determine the optimal stopping point of the reasoning process, resulting in verbose and repetitive outputs. Verbosity also significantly varies across wrong vs correct responses. To address these issues, we propose two solutions: (1) Temperature scaling (TS) to control the stopping point for the thinking phase and thereby trace length, and (2) TLDR: a length-regularized reinforcement learning method based on GRPO that facilitates multi-level trace length control (e.g. short, medium, long reasoning). Experiments on four reasoning benchmarks, MATH500, AMC, AIME24 and OlympiadBench, demonstrate that TS is highly effective compared to s1's budget forcing approach and TLDR significantly improves token efficiency by about 50% with minimal to no accuracy loss over the SFT baseline. Moreover, TLDR also facilitates flexible control over the response length, offering a practical and effective solution for token-efficient reasoning in small models. Ultimately, our work reveals the importance of stopping time control, highlights shortcomings of pure SFT, and provides effective algorithmic recipes.
Similar Papers
When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents
Computation and Language
Teaches computers to think and use tools better.
SplitReason: Learning To Offload Reasoning
Computation and Language
Smart AI asks bigger AI for hard math help.
On the Role of Temperature Sampling in Test-Time Scaling
Artificial Intelligence
Makes AI smarter by trying different thinking styles.