Score: 0

Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router

Published: June 6, 2025 | arXiv ID: 2506.05901v1

By: Chenyang Shao , Xinyang Liu , Yutang Lin and more

Potential Business Impact:

Smarter AI uses small AI for easy tasks.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Multi-step reasoning has proven essential for enhancing the problem-solving capabilities of Large Language Models (LLMs) by decomposing complex tasks into intermediate steps, either explicitly or implicitly. Extending the reasoning chain at test time through deeper thought processes or broader exploration, can furthur improve performance, but often incurs substantial costs due to the explosion in token usage. Yet, many reasoning steps are relatively simple and can be handled by more efficient smaller-scale language models (SLMs). This motivates hybrid approaches that allocate subtasks across models of varying capacities. However, realizing such collaboration requires accurate task decomposition and difficulty-aware subtask allocation, which is challenging. To address this, we propose R2-Reasoner, a novel framework that enables collaborative reasoning across heterogeneous LLMs by dynamically routing sub-tasks based on estimated complexity. At the core of our framework is a Reinforced Model Router, composed of a task decomposer and a subtask allocator. The task decomposer segments complex input queries into logically ordered subtasks, while the subtask allocator assigns each subtask to the most appropriate model, ranging from lightweight SLMs to powerful LLMs, balancing accuracy and efficiency. To train this router, we introduce a staged pipeline that combines supervised fine-tuning on task-specific datasets with Group Relative Policy Optimization algorithm, enabling self-supervised refinement through iterative reinforcement learning. Extensive experiments across four challenging benchmarks demonstrate that R2-Reasoner reduces API costs by 86.85% while maintaining or surpassing baseline accuracy. Our framework paves the way for more cost-effective and adaptive LLM reasoning. The code is open-source at https://anonymous.4open.science/r/R2_Reasoner .

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

Computation and Language

Makes smart AI faster and cheaper to use.

27 May 2025 1

90%

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Computation and Language

Lets AI pick the best AI for each question.

10 Jun 2025 1

90%

Training Language Models to Reason Efficiently

Machine Learning (CS)

Makes smart computer programs think faster, cheaper.

6 Feb 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

33 pages

Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router

Smarter AI uses small AI for easy tasks.

Technical Abstract

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Training Language Models to Reason Efficiently