Score: 0

Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models

Published: December 17, 2025 | arXiv ID: 2512.15347v1

By: Shiran Ge , Chenyi Huang , Yuang Ai and more

Potential Business Impact:

Makes AI learn better by picking good examples.

Business Areas:

A/B Testing Data and Analytics

Group Relative Policy Optimization (GRPO) is a powerful technique for aligning generative models, but its effectiveness is bottlenecked by the conflict between large group sizes and prohibitive computational costs. In this work, we investigate the trade-off through empirical studies, yielding two key observations. First, we discover the reward clustering phenomenon in which many trajectories collapse toward the group-mean reward, offering limited optimization value. Second, we design a heuristic strategy named Optimal Variance Filtering (OVF), and verify that a high-variance subset of trajectories, selected by OVF can outperform the larger, unfiltered group. However, this static, post-sampling OVF approach still necessitates critical computational overhead, as it performs unnecessary sampling for trajectories that are ultimately discarded. To resolve this, we propose Pro-GRPO (Proactive GRPO), a novel dynamic framework that integrates latent feature-based trajectory pruning into the sampling process. Through the early termination of reward-clustered trajectories, Pro-GRPO reduces computational overhead. Leveraging its efficiency, Pro-GRPO employs an "Expand-and-Prune" strategy. This strategy first expands the size of initial sampling group to maximize trajectory diversity, then it applies multi-step OVF to the latents, avoiding prohibitive computational costs. Extensive experiments on both diffusion-based and flow-based models demonstrate the generality and effectiveness of our Pro-GRPO framework.

Growing with the Generator: Self-paced GRPO for Video Generation

CV and Pattern Recognition

Makes AI videos better by learning as it goes.

24 Nov 2025 0

91%

Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards

CV and Pattern Recognition

Makes AI art match words and colors better.

30 Nov 2025 1

91%

On the Theory and Practice of GRPO: A Trajectory-Corrected Approach with Fast Convergence

Machine Learning (CS)

Teaches computers to learn better, faster.

4 Aug 2025 0

View PDF Login to Bookmark

Page Count

10 pages

Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models

Makes AI learn better by picking good examples.

Technical Abstract

Growing with the Generator: Self-paced GRPO for Video Generation

Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards

On the Theory and Practice of GRPO: A Trajectory-Corrected Approach with Fast Convergence