Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
By: Davide Paglieri , Bartłomiej Cupiał , Jonathan Cook and more
Potential Business Impact:
Helps AI decide when to think ahead for tasks.
Training large language models (LLMs) to reason via reinforcement learning (RL) significantly improves their problem-solving capabilities. In agentic settings, existing methods like ReAct prompt LLMs to explicitly plan before every action; however, we demonstrate that always planning is computationally expensive and degrades performance on long-horizon tasks, while never planning further limits performance. To address this, we introduce a conceptual framework formalizing dynamic planning for LLM agents, enabling them to flexibly decide when to allocate test-time compute for planning. We propose a simple two-stage training pipeline: (1) supervised fine-tuning on diverse synthetic data to prime models for dynamic planning, and (2) RL to refine this capability in long-horizon environments. Experiments on the Crafter environment show that dynamic planning agents trained with this approach are more sample-efficient and consistently achieve more complex objectives. Additionally, we demonstrate that these agents can be effectively steered by human-written plans, surpassing their independent capabilities. To our knowledge, this work is the first to explore training LLM agents for dynamic test-time compute allocation in sequential decision-making tasks, paving the way for more efficient, adaptive, and controllable agentic systems.
Similar Papers
Idea2Plan: Exploring AI-Powered Research Planning
Computation and Language
Helps computers plan science experiments from ideas.
Collaborative LLM Inference via Planning for Efficient Reasoning
Artificial Intelligence
Lets free AI models solve hard problems together.
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
Computation and Language
Helps computers plan and do complex tasks.