Score: 0

BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models

Published: September 7, 2025 | arXiv ID: 2509.06040v1

By: Yuming Li , Yikai Wang , Yuying Zhu and more

Potential Business Impact:

Makes AI create better pictures and videos faster.

Business Areas:
A/B Testing Data and Analytics

Recent advancements in aligning image and video generative models via GRPO have achieved remarkable gains in enhancing human preference alignment. However, these methods still face high computational costs from on-policy rollouts and excessive SDE sampling steps, as well as training instability due to sparse rewards. In this paper, we propose BranchGRPO, a novel method that introduces a branch sampling policy updating the SDE sampling process. By sharing computation across common prefixes and pruning low-reward paths and redundant depths, BranchGRPO substantially lowers the per-update compute cost while maintaining or improving exploration diversity. This work makes three main contributions: (1) a branch sampling scheme that reduces rollout and training cost; (2) a tree-based advantage estimator incorporating dense process-level rewards; and (3) pruning strategies exploiting path and depth redundancy to accelerate convergence and boost performance. Experiments on image and video preference alignment show that BranchGRPO improves alignment scores by 16% over strong baselines, while cutting training time by 50%.

Page Count
12 pages

Category
Computer Science:
CV and Pattern Recognition