TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning
By: Huiyuan Lai, Malvina Nissim
Potential Business Impact:
Teaches computers to solve math faster, smarter.
Large Language Models (LLMs) have shown remarkable performance on complex reasoning tasks, especially when equipped with long chain-of-thought (CoT) reasoning. However, eliciting long CoT typically requires large-scale reinforcement learning (RL) training, while often leading to overthinking with redundant intermediate steps. To improve learning and reasoning efficiency, while preserving or even enhancing performance, we propose TACLer, a model-tailored curriculum reinforcement learning framework that gradually increases the complexity of the data based on the model's proficiency in multi-stage RL training. TACLer features two core components: (i) tailored curriculum learning that determines what knowledge the model lacks and needs to learn in progressive stages; (ii) a hybrid Thinking/NoThinking reasoning paradigm that balances accuracy and efficiency by enabling or disabling the Thinking mode. Our experiments show that TACLer yields a twofold advantage in learning and reasoning: (i) it reduces computational cost, cutting training compute by over 50% compared to long thinking models and reducing inference token usage by over 42% relative to the base model; and (ii) it improves accuracy by over 9% on the base model, consistently outperforming state-of-the-art Nothinking and Thinking baselines across four math datasets with complex problems.
Similar Papers
Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code
Computation and Language
Teaches computers to think better using code.
Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning
Computation and Language
Makes AI think less to solve problems faster.
Interleaved Reasoning for Large Language Models via Reinforcement Learning
Computation and Language
Makes smart computers answer questions faster.