Deep Audio Watermarks are Shallow: Limitations of Post-Hoc Watermarking Techniques for Speech
By: Patrick O'Reilly , Zeyu Jin , Jiaqi Su and more
Potential Business Impact:
Makes hidden audio messages harder to remove.
In the audio modality, state-of-the-art watermarking methods leverage deep neural networks to allow the embedding of human-imperceptible signatures in generated audio. The ideal is to embed signatures that can be detected with high accuracy when the watermarked audio is altered via compression, filtering, or other transformations. Existing audio watermarking techniques operate in a post-hoc manner, manipulating "low-level" features of audio recordings after generation (e.g. through the addition of a low-magnitude watermark signal). We show that this post-hoc formulation makes existing audio watermarks vulnerable to transformation-based removal attacks. Focusing on speech audio, we (1) unify and extend existing evaluations of the effect of audio transformations on watermark detectability, and (2) demonstrate that state-of-the-art post-hoc audio watermarks can be removed with no knowledge of the watermarking scheme and minimal degradation in audio quality.
Similar Papers
SoK: How Robust is Audio Watermarking in Generative AI models?
Cryptography and Security
Makes AI voices harder to fake or change.
The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures
Machine Learning (CS)
Makes voice security systems work better with hidden messages.
How Good is Post-Hoc Watermarking With Language Model Rephrasing?
Cryptography and Security
Makes AI writing traceable, even after it's written.