DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
By: Rui Lu , Zhenyu Hou , Zihan Wang and more
Potential Business Impact:
Helps computers find answers by searching the web.
Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supervised data. To address these challenges, we present DeepDive to advance deep search agents. First, we propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs. Second, we apply end-to-end multi-turn reinforcement learning (RL) to enhance LLMs' long-horizon reasoning with deep search. Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. We demonstrate that multi-turn RL training improves deep search ability and significantly contributes to the performance improvements across multiple benchmarks. We observe that DeepDive enables test-time scaling of tool calls and parallel sampling. All datasets, models, and code are publicly available at https://github.com/THUDM/DeepDive.
Similar Papers
Reinforcement Learning for Long-Horizon Multi-Turn Search Agents
Computation and Language
AI learns better by trying and failing.
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
Artificial Intelligence
Helps computers learn to research the real internet.
Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning
Computation and Language
Helps chatbots understand and adapt to changing conversations.