Removal Attack and Defense on AI-generated Content Latent-based Watermarking
By: De Zhang Lee , Han Fang , Hanyi Wang and more
Potential Business Impact:
Hides AI art secrets from sneaky removers.
Digital watermarks can be embedded into AI-generated content (AIGC) by initializing the generation process with starting points sampled from a secret distribution. When combined with pseudorandom error-correcting codes, such watermarked outputs can remain indistinguishable from unwatermarked objects, while maintaining robustness under whitenoise. In this paper, we go beyond indistinguishability and investigate security under removal attacks. We demonstrate that indistinguishability alone does not necessarily guarantee resistance to adversarial removal. Specifically, we propose a novel attack that exploits boundary information leaked by the locations of watermarked objects. This attack significantly reduces the distortion required to remove watermarks -- by up to a factor of $15 \times$ compared to a baseline whitenoise attack under certain settings. To mitigate such attacks, we introduce a defense mechanism that applies a secret transformation to hide the boundary, and prove that the secret transformation effectively rendering any attacker's perturbations equivalent to those of a naive whitenoise adversary. Our empirical evaluations, conducted on multiple versions of Stable Diffusion, validate the effectiveness of both the attack and the proposed defense, highlighting the importance of addressing boundary leakage in latent-based watermarking schemes.
Similar Papers
Removal Attack and Defense on AI-generated Content Latent-based Watermarking
Cryptography and Security
Stops AI art from being secretly changed.
On the Information-Theoretic Fragility of Robust Watermarking under Diffusion Editing
Cryptography and Security
Breaks hidden codes in pictures using AI.
Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image
CV and Pattern Recognition
Breaks hidden codes in AI-generated pictures.