Watermark Robustness and Radioactivity May Be at Odds in Federated Learning
By: Leixu Huang, Zedian Shao, Teodora Baluta
Potential Business Impact:
Tracks computer writing even after changes.
Federated learning (FL) enables fine-tuning large language models (LLMs) across distributed data sources. As these sources increasingly include LLM-generated text, provenance tracking becomes essential for accountability and transparency. We adapt LLM watermarking for data provenance in FL where a subset of clients compute local updates on watermarked data, and the server averages all updates into the global LLM. In this setup, watermarks are radioactive: the watermark signal remains detectable after fine-tuning with high confidence. The $p$-value can reach $10^{-24}$ even when as little as $6.6\%$ of data is watermarked. However, the server can act as an active adversary that wants to preserve model utility while evading provenance tracking. Our observation is that updates induced by watermarked synthetic data appear as outliers relative to non-watermark updates. Our adversary thus applies strong robust aggregation that can filter these outliers, together with the watermark signal. All evaluated radioactive watermarks are not robust against such an active filtering server. Our work suggests fundamental trade-offs between radioactivity, robustness, and utility.
Similar Papers
HMARK: Radioactive Multi-Bit Semantic-Latent Watermarking for Diffusion Models
Cryptography and Security
Marks AI art to show if it used stolen pictures.
Robust Client-Server Watermarking for Split Federated Learning
Cryptography and Security
Protects AI model ownership for everyone involved.
Watermarks for Language Models via Probabilistic Automata
Cryptography and Security
Makes AI writing harder to fake.