Protecting Your Voice: Temporal-aware Robust Watermarking
By: Yue Li , Weizhi Liu , Dongdong Lin and more
Potential Business Impact:
Makes fake voices sound real, but still detectable.
The rapid advancement of generative models has led to the synthesis of real-fake ambiguous voices. To erase the ambiguity, embedding watermarks into the frequency-domain features of synthesized voices has become a common routine. However, the robustness achieved by choosing the frequency domain often comes at the expense of fine-grained voice features, leading to a loss of fidelity. Maximizing the comprehensive learning of time-domain features to enhance fidelity while maintaining robustness, we pioneer a \textbf{\underline{t}}emporal-aware \textbf{\underline{r}}ob\textbf{\underline{u}}st wat\textbf{\underline{e}}rmarking (\emph{True}) method for protecting the speech and singing voice. For this purpose, the integrated content-driven encoder is designed for watermarked waveform reconstruction, which is structurally lightweight. Additionally, the temporal-aware gated convolutional network is meticulously designed to bit-wise recover the watermark. Comprehensive experiments and comparisons with existing state-of-the-art methods have demonstrated the superior fidelity and vigorous robustness of the proposed \textit{True} achieving an average PESQ score of 4.63.
Similar Papers
TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Attribution
Multimedia
Marks fake voices so creators keep their work.
AWARE: Audio Watermarking with Adversarial Resistance to Edits
Sound
Protects music from being copied without permission.
Efficient Speech Watermarking for Speech Synthesis via Progressive Knowledge Distillation
Sound
Stops fake voices from being used wrongly.