MSARL: Decoupling Reasoning and Tool Use with Multi-Small-Agent Reinforcement Learning
By: Dayu Wang , Jiaye Yang , Weikang Li and more
Potential Business Impact:
AI agents work together to solve math problems.
Recent advances in multi-agent systems highlight the potential of specialized small agents that collaborate via division of labor. Existing tool-integrated reasoning systems, however, often follow a single-agent paradigm in which one large model interleaves long-horizon reasoning with precise tool operations, leading to cognitive-load interference and unstable coordination. We present MSARL, a Multi-Small-Agent Reinforcement Learning framework that explicitly decouples reasoning from tool use. In MSARL, a Reasoning Agent decomposes problems and plans tool invocations, while multiple Tool Agents specialize in specific external tools, each trained via a combination of imitation learning and reinforcement learning with role-specific rewards. On mathematical problem solving with code execution, MSARL significantly improves reasoning stability and final-answer accuracy over single-agent baselines. Moreover, the architecture generalizes to diverse tool-use tasks, demonstrating that cognitive-role decoupling with small agents is a scalable blueprint for multi-agent AI design.
Similar Papers
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents
Artificial Intelligence
AI learns to research and think like a person.
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents
Artificial Intelligence
AI learns to research and solve problems alone.
MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games
Artificial Intelligence
Teaches AI to work together and win games.