Score: 1

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Published: September 2, 2025 | arXiv ID: 2509.02479v2

By: Zhenghai Xue , Longtao Zheng , Qian Liu and more

Potential Business Impact:

Makes AI better at solving hard math problems.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Models (LLMs) can significantly improve their reasoning capabilities by interacting with external tools, a paradigm known as Tool-Integrated Reasoning (TIR). However, extending TIR to multi-turn scenarios using Reinforcement Learning (RL) is often hindered by training instability and performance collapse. We identify that such instability is primarily caused by a distributional drift from external tool feedback, leading to the generation of low-probability tokens. This issue compounds over successive turns, causing catastrophic gradient norm explosions that derail the training process. To address this challenge, we introduce SimpleTIR , a plug-and-play algorithm that stabilizes multi-turn TIR training. Its core strategy is to identify and filter out trajectories containing void turns, i.e., turns that yield neither a code block nor a final answer. By removing these problematic trajectories from the policy update, SimpleTIR effectively blocks the harmful, high-magnitude gradients, thus stabilizing the learning dynamics. Extensive experiments show that SimpleTIR achieves state-of-the-art performance on challenging math reasoning benchmarks, notably elevating the AIME24 score from a text-only baseline of 22.1 to 50.5 when starting from the Qwen2.5-7B base model. Furthermore, by avoiding the constraints of supervised fine-tuning, SimpleTIR encourages the model to discover diverse and sophisticated reasoning patterns, such as self-correction and cross-validation.

Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents

Computation and Language

Teaches computers to use tools with voice commands.

17 Sep 2025 3

90%

MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL

Artificial Intelligence

Teaches computers to understand questions and find answers.

29 Oct 2025 1

90%

Empowering Multi-Turn Tool-Integrated Reasoning with Group Turn Policy Optimization

Machine Learning (CS)

Teaches AI to solve math problems step-by-step.

18 Nov 2025 2

View PDF Login to Bookmark

Page Count

22 pages

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Makes AI better at solving hard math problems.

Technical Abstract

Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents

MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL

Empowering Multi-Turn Tool-Integrated Reasoning with Group Turn Policy Optimization