TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
By: Zheng Ding, Weirui Ye
Potential Business Impact:
Trains AI to make better pictures much faster.
Reinforcement learning (RL) post-training is crucial for aligning generative models with human preferences, but its prohibitive computational cost remains a major barrier to widespread adoption. We introduce \textbf{TreeGRPO}, a novel RL framework that dramatically improves training efficiency by recasting the denoising process as a search tree. From shared initial noise samples, TreeGRPO strategically branches to generate multiple candidate trajectories while efficiently reusing their common prefixes. This tree-structured approach delivers three key advantages: (1) \emph{High sample efficiency}, achieving better performance under same training samples (2) \emph{Fine-grained credit assignment} via reward backpropagation that computes step-specific advantages, overcoming the uniform credit assignment limitation of trajectory-based methods, and (3) \emph{Amortized computation} where multi-child branching enables multiple policy updates per forward pass. Extensive experiments on both diffusion and flow-based models demonstrate that TreeGRPO achieves \textbf{2.4$\times$ faster training} while establishing a superior Pareto frontier in the efficiency-reward trade-off space. Our method consistently outperforms GRPO baselines across multiple benchmarks and reward models, providing a scalable and effective pathway for RL-based visual generative model alignment. The project website is available at treegrpo.github.io.
Similar Papers
Tree Search for LLM Agent Reinforcement Learning
Machine Learning (CS)
Teaches AI to learn better from mistakes.
Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards
CV and Pattern Recognition
Makes AI art match words and colors better.
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
CV and Pattern Recognition
Makes AI create better pictures and videos faster.