Score: 3

OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents

Published: January 26, 2026 | arXiv ID: 2601.18467v1

By: Yuhang Zhou , Kai Zheng , Qiguang Chen and more

BigTech Affiliations: Tencent

Potential Business Impact:

Trains smart computer helpers without expensive online learning.

Business Areas:

Artificial Intelligence Artificial Intelligence, Data and Analytics, Science and Engineering, Software

Deep research agents have shown remarkable potential in handling long-horizon tasks. However, state-of-the-art performance typically relies on online reinforcement learning (RL), which is financially expensive due to extensive API calls. While offline training offers a more efficient alternative, its progress is hindered by the scarcity of high-quality research trajectories. In this paper, we demonstrate that expensive online reinforcement learning is not all you need to build powerful research agents. To bridge this gap, we introduce a fully open-source suite designed for effective offline training. Our core contributions include DeepForge, a ready-to-use task synthesis framework that generates large-scale research queries without heavy preprocessing; and a curated collection of 66k QA pairs, 33k SFT trajectories, and 21k DPO pairs. Leveraging these resources, we train OffSeeker (8B), a model developed entirely offline. Extensive evaluations across six benchmarks show that OffSeeker not only leads among similar-sized agents but also remains competitive with 30B-parameter systems trained via heavy online RL.

O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL

Computation and Language

Makes free AI smarter than paid AI.

7 Jan 2026 1

88%

Reinforcement Learning Foundations for Deep Research Systems: A Survey

Artificial Intelligence

Teaches AI to solve hard problems better.

8 Sep 2025 1

88%

DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments

Artificial Intelligence

Helps computers learn to research the real internet.

4 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

24 pages

OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents

Trains smart computer helpers without expensive online learning.

Technical Abstract

O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL

Reinforcement Learning Foundations for Deep Research Systems: A Survey

DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments