Generalized Interpolating Discrete Diffusion
By: Dimitri von Rütte , Janis Fluri , Yuhui Ding and more
Potential Business Impact:
Lets AI fix its own writing mistakes.
While state-of-the-art language models achieve impressive results through next-token prediction, they have inherent limitations such as the inability to revise already generated tokens. This has prompted exploration of alternative approaches such as discrete diffusion. However, masked diffusion, which has emerged as a popular choice due to its simplicity and effectiveness, reintroduces this inability to revise words. To overcome this, we generalize masked diffusion, deriving a new family of general interpolating discrete diffusion (GIDD) which offers greater flexibility in the design of the noising processes. Leveraging a novel diffusion ELBO, we achieve compute-matched state-of-the-art performance in diffusion language modeling. Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality and unlocking the ability for the model to correct its own mistakes, an area where autoregressive models notoriously have struggled. Code: https://github.com/dvruette/gidd/
Similar Papers
The Diffusion Duality
Machine Learning (CS)
Makes computers write stories much faster.
Any-Order Flexible Length Masked Diffusion
Machine Learning (CS)
Lets computers create text of any length.
Simple Denoising Diffusion Language Models
Machine Learning (CS)
Makes computers write better stories and sentences.