HATS: High-Accuracy Triple-Set Watermarking for Large Language Models
By: Zhiqing Hu , Chenxu Zhao , Jiazhong Lu and more
Potential Business Impact:
Marks computer writing so you know it's from a machine.
Misuse of LLM-generated text can be curbed by watermarking techniques that embed implicit signals into the output. We propose a watermark that partitions the vocabulary at each decoding step into three sets (Green/Yellow/Red) with fixed ratios and restricts sampling to the Green and Yellow sets. At detection time, we replay the same partitions, compute Green-enrichment and Red-depletion statistics, convert them to one-sided z-scores, and aggregate their p-values via Fisher's method to decide whether a passage is watermarked. We implement generation, detection, and testing on Llama 2 7B, and evaluate true-positive rate, false-positive rate, and text quality. Results show that the triple-partition scheme achieves high detection accuracy at fixed FPR while preserving readability.
Similar Papers
Watermarking for Factuality: Guiding Vision-Language Models Toward Truth via Tri-layer Contrastive Decoding
CV and Pattern Recognition
Makes AI describe pictures more truthfully.
Improving Detection of Watermarked Language Models
Computation and Language
Finds fake AI writing by combining methods.
How Good is Post-Hoc Watermarking With Language Model Rephrasing?
Cryptography and Security
Makes AI writing traceable, even after it's written.