Watermarking Discrete Diffusion Language Models
By: Avi Bagchi , Akhil Bhimaraju , Moulik Choraria and more
Potential Business Impact:
Marks AI writing so you know it's fake.
Watermarking has emerged as a promising technique to track AI-generated content and differentiate it from authentic human creations. While prior work extensively studies watermarking for autoregressive large language models (LLMs) and image diffusion models, none address discrete diffusion language models, which are becoming popular due to their high inference throughput. In this paper, we introduce the first watermarking method for discrete diffusion models by applying the distribution-preserving Gumbel-max trick at every diffusion step and seeding the randomness with the sequence index to enable reliable detection. We experimentally demonstrate that our scheme is reliably detectable on state-of-the-art diffusion language models and analytically prove that it is distortion-free with an exponentially decaying probability of false detection in the token sequence length.
Similar Papers
DMark: Order-Agnostic Watermarking for Diffusion Large Language Models
Machine Learning (CS)
Marks AI writing so you know it's from a computer.
Visual Watermarking in the Era of Diffusion Models: Advances and Challenges
CV and Pattern Recognition
Protects pictures from being copied without permission.
Yet Another Watermark for Large Language Models
Cryptography and Security
Marks computer writing so you know it's real.