The Diffusion Duality
By: Subham Sekhar Sahoo , Justin Deschenaux , Aaron Gokaslan and more
Potential Business Impact:
Makes computers write stories much faster.
Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code and model checkpoints on the project page: http://s-sahoo.github.io/duo
Similar Papers
Unified Multimodal Discrete Diffusion
CV and Pattern Recognition
Creates pictures and stories together, better than before.
Generalized Interpolating Discrete Diffusion
Computation and Language
Lets AI fix its own writing mistakes.
Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces
Machine Learning (CS)
Lets computers create matching pictures and words.