Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms
By: Jiaming Song, Linqi Zhou
Potential Business Impact:
Makes AI understand pictures and words faster.
Recent years have seen significant advancements in foundation models through generative pre-training, yet algorithmic innovation in this space has largely stagnated around autoregressive models for discrete signals and diffusion models for continuous signals. This stagnation creates a bottleneck that prevents us from fully unlocking the potential of rich multi-modal data, which in turn limits the progress on multimodal intelligence. We argue that an inference-first perspective, which prioritizes scaling efficiency during inference time across sequence length and refinement steps, can inspire novel generative pre-training algorithms. Using Inductive Moment Matching (IMM) as a concrete example, we demonstrate how addressing limitations in diffusion models' inference process through targeted modifications yields a stable, single-stage algorithm that achieves superior sample quality with over an order of magnitude greater inference efficiency.
Similar Papers
Inductive Moment Matching
Machine Learning (CS)
Creates amazing pictures super fast with less effort.
Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling
CV and Pattern Recognition
Makes AI draw better pictures faster.
Exploring Training and Inference Scaling Laws in Generative Retrieval
Information Retrieval
Makes computers find information by writing it.