One-step Latent-free Image Generation with Pixel Mean Flows
By: Yiyang Lu , Susie Lu , Qiao Sun and more
Potential Business Impact:
Makes computers draw pictures faster and better.
Modern diffusion/flow-based models for image generation typically exhibit two core characteristics: (i) using multi-step sampling, and (ii) operating in a latent space. Recent advances have made encouraging progress on each aspect individually, paving the way toward one-step diffusion/flow without latents. In this work, we take a further step towards this goal and propose "pixel MeanFlow" (pMF). Our core guideline is to formulate the network output space and the loss space separately. The network target is designed to be on a presumed low-dimensional image manifold (i.e., x-prediction), while the loss is defined via MeanFlow in the velocity space. We introduce a simple transformation between the image manifold and the average velocity field. In experiments, pMF achieves strong results for one-step latent-free generation on ImageNet at 256x256 resolution (2.22 FID) and 512x512 resolution (2.48 FID), filling a key missing piece in this regime. We hope that our study will further advance the boundaries of diffusion/flow-based generative models.
Similar Papers
Improved Mean Flows: On the Challenges of Fastforward Generative Models
CV and Pattern Recognition
Makes AI create pictures faster and better.
From Diffusion to One-Step Generation: A Comparative Study of Flow-Based Models with Application to Image Inpainting
CV and Pattern Recognition
Makes pictures from scratch in one step.
PixelFlow: Pixel-Space Generative Models with Flow
CV and Pattern Recognition
Makes computers create amazing pictures from words.