CDLM: Consistency Diffusion Language Models For Faster Sampling
By: Minseo Kim , Chenfeng Xu , Coleman Hooper and more
Potential Business Impact:
Makes AI write and code much faster.
Diffusion Language Models (DLMs) offer a promising parallel generation paradigm but suffer from slow inference due to numerous refinement steps and the inability to use standard KV caching. We introduce CDLM (Consistency Diffusion Language Models), a training-based acceleration method that simultaneously tackles both bottlenecks. CDLM integrates consistency modeling to drastically reduce the number of required sampling steps by enabling multi-token finalization. Furthermore, we enforce a block-wise causal attention mask during fine-tuning, making the model fully compatible with KV caching. Experiments show CDLM achieves 3.6x-14.5x lower latency while maintaining competitive accuracy on math and coding tasks. The full training and evaluation code is available at https://github.com/SqueezeAILab/CDLM.
Similar Papers
A Survey on Diffusion Language Models
Computation and Language
Makes computers write faster and understand better.
Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration
Machine Learning (CS)
Makes AI write and create much faster.
Diffusion Language Models are Super Data Learners
Machine Learning (CS)
Makes AI better at writing code with less data.