Score: 1

Efficient Speech Watermarking for Speech Synthesis via Progressive Knowledge Distillation

Published: September 24, 2025 | arXiv ID: 2509.19812v1

By: Yang Cui , Peter Pan , Lei He and more

BigTech Affiliations: Microsoft

Potential Business Impact:

Stops fake voices from being used wrongly.

Business Areas:
Speech Recognition Data and Analytics, Software

With the rapid advancement of speech generative models, unauthorized voice cloning poses significant privacy and security risks. Speech watermarking offers a viable solution for tracing sources and preventing misuse. Current watermarking technologies fall mainly into two categories: DSP-based methods and deep learning-based methods. DSP-based methods are efficient but vulnerable to attacks, whereas deep learning-based methods offer robust protection at the expense of significantly higher computational cost. To improve the computational efficiency and enhance the robustness, we propose PKDMark, a lightweight deep learning-based speech watermarking method that leverages progressive knowledge distillation (PKD). Our approach proceeds in two stages: (1) training a high-performance teacher model using an invertible neural network-based architecture, and (2) transferring the teacher's capabilities to a compact student model through progressive knowledge distillation. This process reduces computational costs by 93.6% while maintaining high level of robust performance and imperceptibility. Experimental results demonstrate that our distilled model achieves an average detection F1 score of 99.6% with a PESQ of 4.30 in advanced distortions, enabling efficient speech watermarking for real-time speech synthesis applications.

Country of Origin
🇺🇸 United States

Page Count
7 pages

Category
Computer Science:
Sound