Score: 1

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress

Published: November 11, 2025 | arXiv ID: 2511.08325v1

By: Zhiheng Xi , Chenyang Liao , Guanyu Li and more

Potential Business Impact:

Helps AI make better choices step-by-step.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Despite rapid development, large language models (LLMs) still encounter challenges in multi-turn decision-making tasks (i.e., agent tasks) like web shopping and browser navigation, which require making a sequence of intelligent decisions based on environmental feedback. Previous work for LLM agents typically relies on elaborate prompt engineering or fine-tuning with expert trajectories to improve performance. In this work, we take a different perspective: we explore constructing process reward models (PRMs) to evaluate each decision and guide the agent's decision-making process. Unlike LLM reasoning, where each step is scored based on correctness, actions in agent tasks do not have a clear-cut correctness. Instead, they should be evaluated based on their proximity to the goal and the progress they have made. Building on this insight, we propose a re-defined PRM for agent tasks, named AgentPRM, to capture both the interdependence between sequential decisions and their contribution to the final goal. This enables better progress tracking and exploration-exploitation balance. To scalably obtain labeled data for training AgentPRM, we employ a Temporal Difference-based (TD-based) estimation method combined with Generalized Advantage Estimation (GAE), which proves more sample-efficient than prior methods. Extensive experiments across different agentic tasks show that AgentPRM is over $8\times$ more compute-efficient than baselines, and it demonstrates robust improvement when scaling up test-time compute. Moreover, we perform detailed analyses to show how our method works and offer more insights, e.g., applying AgentPRM to the reinforcement learning of LLM agents.

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

Computation and Language

Teaches computers to think step-by-step.

9 Oct 2025 0

91%

The Bidirectional Process Reward Model

Computation and Language

Helps AI check its thinking both ways.

3 Aug 2025 0

91%

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning

Computation and Language

Fixes math problems by explaining each step.

6 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com github.com github.com

Page Count

23 pages

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress

Helps AI make better choices step-by-step.

Technical Abstract

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

The Bidirectional Process Reward Model

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning