Watermarks for Language Models via Probabilistic Automata
By: Yangkun Wang, Jingbo Shang
Potential Business Impact:
Makes AI writing harder to fake.
A recent watermarking scheme for language models achieves distortion-free embedding and robustness to edit-distance attacks. However, it suffers from limited generation diversity and high detection overhead. In parallel, recent research has focused on undetectability, a property ensuring that watermarks remain difficult for adversaries to detect and spoof. In this work, we introduce a new class of watermarking schemes constructed through probabilistic automata. We present two instantiations: (i) a practical scheme with exponential generation diversity and computational efficiency, and (ii) a theoretical construction with formal undetectability guarantees under cryptographic assumptions. Extensive experiments on LLaMA-3B and Mistral-7B validate the superior performance of our scheme in terms of robustness and efficiency.
Similar Papers
Watermarking Discrete Diffusion Language Models
Cryptography and Security
Marks AI writing so you know it's fake.
Yet Another Watermark for Large Language Models
Cryptography and Security
Marks computer writing so you know it's real.
Yet Another Watermark for Large Language Models
Cryptography and Security
Marks AI writing so you know it's from a machine.