Score: 1

Protecting Your Voice: Temporal-aware Robust Watermarking

Published: April 21, 2025 | arXiv ID: 2504.14832v2

By: Yue Li , Weizhi Liu , Dongdong Lin and more

Potential Business Impact:

Makes fake voices sound real, but still detectable.

Business Areas:
Speech Recognition Data and Analytics, Software

The rapid advancement of generative models has led to the synthesis of real-fake ambiguous voices. To erase the ambiguity, embedding watermarks into the frequency-domain features of synthesized voices has become a common routine. However, the robustness achieved by choosing the frequency domain often comes at the expense of fine-grained voice features, leading to a loss of fidelity. Maximizing the comprehensive learning of time-domain features to enhance fidelity while maintaining robustness, we pioneer a \textbf{\underline{t}}emporal-aware \textbf{\underline{r}}ob\textbf{\underline{u}}st wat\textbf{\underline{e}}rmarking (\emph{True}) method for protecting the speech and singing voice. For this purpose, the integrated content-driven encoder is designed for watermarked waveform reconstruction, which is structurally lightweight. Additionally, the temporal-aware gated convolutional network is meticulously designed to bit-wise recover the watermark. Comprehensive experiments and comparisons with existing state-of-the-art methods have demonstrated the superior fidelity and vigorous robustness of the proposed \textit{True} achieving an average PESQ score of 4.63.

Country of Origin
🇨🇳 China

Page Count
5 pages

Category
Computer Science:
Cryptography and Security