Score: 1

PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations

Published: January 8, 2026 | arXiv ID: 2601.04674v1

By: Chengcheng Guo , Kuo Cai , Yu Zhou and more

BigTech Affiliations: Kuaishou

Potential Business Impact:

Fixes bad suggestions by checking each step.

Business Areas:

Semantic Search Internet Services

Generative Recommendation has emerged as a promising paradigm, reformulating recommendation as a sequence-to-sequence generation task over hierarchical Semantic IDs. However, existing methods suffer from a critical issue we term Semantic Drift, where errors in early, high-level tokens irreversibly divert the generation trajectory into irrelevant semantic subspaces. Inspired by Process Reward Models (PRMs) that enhance reasoning in Large Language Models, we propose Promise, a novel framework that integrates dense, step-by-step verification into generative models. Promise features a lightweight PRM to assess the quality of intermediate inference steps, coupled with a PRM-guided Beam Search strategy that leverages dense feedback to dynamically prune erroneous branches. Crucially, our approach unlocks Test-Time Scaling Laws for recommender systems: by increasing inference compute, smaller models can match or surpass larger models. Extensive offline experiments and online A/B tests on a large-scale platform demonstrate that Promise effectively mitigates Semantic Drift, significantly improving recommendation accuracy while enabling efficient deployment.

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress

Computation and Language

Helps AI make better choices step-by-step.

11 Nov 2025 1

90%

GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning

Computation and Language

Makes AI better at solving math problems.

1 Apr 2025 2

90%

From Mathematical Reasoning to Code: Generalization of Process Reward Models in Test-Time Scaling

Computation and Language

Helps computers solve problems better with feedback.

24 May 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

11 pages

PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations

Fixes bad suggestions by checking each step.

Technical Abstract

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress

GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning

From Mathematical Reasoning to Code: Generalization of Process Reward Models in Test-Time Scaling