Score: 1

PilotRL: Training Language Model Agents via Global Planning-Guided Progressive Reinforcement Learning

Published: August 1, 2025 | arXiv ID: 2508.00344v1

By: Keer Lu , Chong Chen , Bin Cui and more

Potential Business Impact:

Helps AI agents plan and act better.

Large Language Models (LLMs) have shown remarkable advancements in tackling agent-oriented tasks. Despite their potential, existing work faces challenges when deploying LLMs in agent-based environments. The widely adopted agent paradigm ReAct centers on integrating single-step reasoning with immediate action execution, which limits its effectiveness in complex tasks requiring long-term strategic planning. Furthermore, the coordination between the planner and executor during problem-solving is also a critical factor to consider in agent design. Additionally, current approaches predominantly rely on supervised fine-tuning, which often leads models to memorize established task completion trajectories, thereby restricting their generalization ability when confronted with novel problem contexts. To address these challenges, we introduce an adaptive global plan-based agent paradigm AdaPlan, aiming to synergize high-level explicit guidance with execution to support effective long-horizon decision-making. Based on the proposed paradigm, we further put forward PilotRL, a global planning-guided training framework for LLM agents driven by progressive reinforcement learning. We first develop the model's ability to follow explicit guidance from global plans when addressing agent tasks. Subsequently, based on this foundation, we focus on optimizing the quality of generated plans. Finally, we conduct joint optimization of the model's planning and execution coordination. Experiments indicate that PilotRL could achieve state-of-the-art performances, with LLaMA3.1-8B-Instruct + PilotRL surpassing closed-sourced GPT-4o by 3.60%, while showing a more substantial gain of 55.78% comparing to GPT-4o-mini at a comparable parameter scale.

PilotRL: Training Language Model Agents via Global Planning-Guided Progressive Reinforcement Learning

Computation and Language

Helps AI agents plan and solve harder problems.

1 Aug 2025 1

91%

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Computation and Language

Teaches AI to learn and solve problems better.

18 Nov 2025 1

90%

Subgoal Graph-Augmented Planning for LLM-Guided Open-World Reinforcement Learning

Machine Learning (CS)

Helps robots follow plans by checking steps.

26 Nov 2025 0

View PDF Login to Bookmark

Page Count

22 pages

PilotRL: Training Language Model Agents via Global Planning-Guided Progressive Reinforcement Learning

Helps AI agents plan and act better.

Technical Abstract

PilotRL: Training Language Model Agents via Global Planning-Guided Progressive Reinforcement Learning

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Subgoal Graph-Augmented Planning for LLM-Guided Open-World Reinforcement Learning