Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router
By: Chenyang Shao , Xinyang Liu , Yutang Lin and more
Potential Business Impact:
Smarter AI uses small AI for easy tasks.
Multi-step reasoning has proven essential for enhancing the problem-solving capabilities of Large Language Models (LLMs) by decomposing complex tasks into intermediate steps, either explicitly or implicitly. Extending the reasoning chain at test time through deeper thought processes or broader exploration, can furthur improve performance, but often incurs substantial costs due to the explosion in token usage. Yet, many reasoning steps are relatively simple and can be handled by more efficient smaller-scale language models (SLMs). This motivates hybrid approaches that allocate subtasks across models of varying capacities. However, realizing such collaboration requires accurate task decomposition and difficulty-aware subtask allocation, which is challenging. To address this, we propose R2-Reasoner, a novel framework that enables collaborative reasoning across heterogeneous LLMs by dynamically routing sub-tasks based on estimated complexity. At the core of our framework is a Reinforced Model Router, composed of a task decomposer and a subtask allocator. The task decomposer segments complex input queries into logically ordered subtasks, while the subtask allocator assigns each subtask to the most appropriate model, ranging from lightweight SLMs to powerful LLMs, balancing accuracy and efficiency. To train this router, we introduce a staged pipeline that combines supervised fine-tuning on task-specific datasets with Group Relative Policy Optimization algorithm, enabling self-supervised refinement through iterative reinforcement learning. Extensive experiments across four challenging benchmarks demonstrate that R2-Reasoner reduces API costs by 86.85% while maintaining or surpassing baseline accuracy. Our framework paves the way for more cost-effective and adaptive LLM reasoning. The code is open-source at https://anonymous.4open.science/r/R2_Reasoner .
Similar Papers
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Computation and Language
Makes smart AI faster and cheaper to use.
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
Computation and Language
Lets AI pick the best AI for each question.
Training Language Models to Reason Efficiently
Machine Learning (CS)
Makes smart computer programs think faster, cheaper.