Fine-tuning Flow Matching Generative Models with Intermediate Feedback
By: Jiajun Fan , Chaoran Cheng , Shuaike Shen and more
Potential Business Impact:
Makes AI pictures better match your words.
Flow-based generative models have shown remarkable success in text-to-image generation, yet fine-tuning them with intermediate feedback remains challenging, especially for continuous-time flow matching models. Most existing approaches solely learn from outcome rewards, struggling with the credit assignment problem. Alternative methods that attempt to learn a critic via direct regression on cumulative rewards often face training instabilities and model collapse in online settings. We present AC-Flow, a robust actor-critic framework that addresses these challenges through three key innovations: (1) reward shaping that provides well-normalized learning signals to enable stable intermediate value learning and gradient control, (2) a novel dual-stability mechanism that combines advantage clipping to prevent destructive policy updates with a warm-up phase that allows the critic to mature before influencing the actor, and (3) a scalable generalized critic weighting scheme that extends traditional reward-weighted methods while preserving model diversity through Wasserstein regularization. Through extensive experiments on Stable Diffusion 3, we demonstrate that AC-Flow achieves state-of-the-art performance in text-to-image alignment tasks and generalization to unseen human preference models. Our results demonstrate that even with a computationally efficient critic model, we can robustly finetune flow models without compromising generative quality, diversity, or stability.
Similar Papers
Value Gradient Guidance for Flow Matching Alignment
Machine Learning (CS)
Makes AI art creation faster and better.
Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning
Machine Learning (CS)
Teaches computers to create new things better.
FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning
Machine Learning (CS)
Teaches computers to learn better by guessing values.