Reinforcement Learning for Long-Horizon Multi-Turn Search Agents
By: Vivek Kalyan, Martin Andrews
Potential Business Impact:
AI learns better by trying and failing.
Large Language Model (LLM) agents can leverage multiple turns and tools to solve complex tasks, with prompt-based approaches achieving strong performance. This work demonstrates that Reinforcement Learning (RL) can push capabilities significantly further by learning from experience. Through experiments on a legal document search benchmark, we show that our RL-trained 14 Billion parameter model outperforms frontier class models (85% vs 78% accuracy). In addition, we explore turn-restricted regimes, during training and at test-time, that show these agents achieve better results if allowed to operate over longer multi-turn horizons.
Similar Papers
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Computation and Language
Teaches AI to learn and solve problems better.
Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning
Computation and Language
Helps chatbots understand and adapt to changing conversations.
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
Computation and Language
Helps computers find answers by searching the web.