A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning
By: Shashank Gupta , Chaitanya Ahuja , Tsung-Yu Lin and more
Potential Business Impact:
Makes AI art generators better and faster.
Reinforcement learning (RL)-based fine-tuning has emerged as a powerful approach for aligning diffusion models with black-box objectives. Proximal policy optimization (PPO) is the most popular choice of method for policy optimization. While effective in terms of performance, PPO is highly sensitive to hyper-parameters and involves substantial computational overhead. REINFORCE, on the other hand, mitigates some computational complexities such as high memory overhead and sensitive hyper-parameter tuning, but has suboptimal performance due to high-variance and sample inefficiency. While the variance of the REINFORCE can be reduced by sampling multiple actions per input prompt and using a baseline correction term, it still suffers from sample inefficiency. To address these challenges, we systematically analyze the efficiency-effectiveness trade-off between REINFORCE and PPO, and propose leave-one-out PPO (LOOP), a novel RL for diffusion fine-tuning method. LOOP combines variance reduction techniques from REINFORCE, such as sampling multiple actions per input prompt and a baseline correction term, with the robustness and sample efficiency of PPO via clipping and importance sampling. Our results demonstrate that LOOP effectively improves diffusion models on various black-box objectives, and achieves a better balance between computational efficiency and performance.
Similar Papers
Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
Machine Learning (CS)
Makes robots learn faster and better from mistakes.
A Practical Introduction to Deep Reinforcement Learning
Machine Learning (CS)
Teaches computers to learn and make smart choices.
Truncated Proximal Policy Optimization
Artificial Intelligence
Trains smart computer brains to solve problems faster.