Reinforcement Learning for Long-Horizon Multi-Turn Search Agents
By: Vivek Kalyan, Martin Andrews
Potential Business Impact:
AI learns better by trying and failing.
Large Language Model (LLM) agents can leverage multiple turns and tools to solve complex tasks, with prompt-based approaches achieving strong performance. This work demonstrates that Reinforcement Learning (RL) can push capabilities significantly further by learning from experience. Through experiments on a legal document search benchmark, we show that our RL-trained 14 Billion parameter model outperforms frontier class models (85% vs 78% accuracy). In addition, we explore turn-restricted regimes, during training and at test-time, that show these agents achieve better results if allowed to operate over longer multi-turn horizons.
Similar Papers
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Computation and Language
Teaches AI to learn and solve problems better.
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
Computation and Language
Helps computers find answers by searching the web.
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Machine Learning (CS)
Teaches AI to solve hard problems by trying things.