BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
By: Yuming Li , Yikai Wang , Yuying Zhu and more
Potential Business Impact:
Makes AI create better pictures and videos faster.
Recent advancements in aligning image and video generative models via GRPO have achieved remarkable gains in enhancing human preference alignment. However, these methods still face high computational costs from on-policy rollouts and excessive SDE sampling steps, as well as training instability due to sparse rewards. In this paper, we propose BranchGRPO, a novel method that introduces a branch sampling policy updating the SDE sampling process. By sharing computation across common prefixes and pruning low-reward paths and redundant depths, BranchGRPO substantially lowers the per-update compute cost while maintaining or improving exploration diversity. This work makes three main contributions: (1) a branch sampling scheme that reduces rollout and training cost; (2) a tree-based advantage estimator incorporating dense process-level rewards; and (3) pruning strategies exploiting path and depth redundancy to accelerate convergence and boost performance. Experiments on image and video preference alignment show that BranchGRPO improves alignment scores by 16% over strong baselines, while cutting training time by 50%.
Similar Papers
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
CV and Pattern Recognition
Makes AI create better pictures and videos faster.
Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models
CV and Pattern Recognition
Makes AI art look more like what people want.
Growing with the Generator: Self-paced GRPO for Video Generation
CV and Pattern Recognition
Makes AI videos better by learning as it goes.