Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation
By: Yifu Luo , Xinhao Hu , Keyu Fan and more
Potential Business Impact:
Makes AI create better pictures from words.
Reinforcement learning (RL) has garnered increasing attention in text-to-image (T2I) generation. However, most existing RL approaches are tailored to either diffusion models or autoregressive models, overlooking an important alternative: masked generative models. In this work, we propose Mask-GRPO, the first method to incorporate Group Relative Policy Optimization (GRPO)-based RL into this overlooked paradigm. Our core insight is to redefine the transition probability, which is different from current approaches, and formulate the unmasking process as a multi-step decision-making problem. To further enhance our method, we explore several useful strategies, including removing the KL constraint, applying the reduction strategy, and filtering out low-quality samples. Using Mask-GRPO, we improve a base model, Show-o, with substantial improvements on standard T2I benchmarks and preference alignment, outperforming existing state-of-the-art approaches. The code is available on https://github.com/xingzhejun/Mask-GRPO
Similar Papers
AR-GRPO: Training Autoregressive Image Generation Models via Reinforcement Learning
CV and Pattern Recognition
Makes AI create better, more realistic pictures.
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models
Artificial Intelligence
Makes AI draw better pictures and solve problems.
GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning
Machine Learning (CS)
Teaches AI to learn better from data.