Are Modern Speech Enhancement Systems Vulnerable to Adversarial Attacks?
By: Rostislav Makarov, Lea Schönherr, Timo Gerkmann
Potential Business Impact:
Makes voices say different things with hidden sounds.
Machine learning approaches for speech enhancement are becoming increasingly expressive, enabling ever more powerful modifications of input signals. In this paper, we demonstrate that this expressiveness introduces a vulnerability: advanced speech enhancement models can be susceptible to adversarial attacks. Specifically, we show that adversarial noise, carefully crafted and psychoacoustically masked by the original input, can be injected such that the enhanced speech output conveys an entirely different semantic meaning. We experimentally verify that contemporary predictive speech enhancement models can indeed be manipulated in this way. Furthermore, we highlight that diffusion models with stochastic samplers exhibit inherent robustness to such adversarial attacks by design.
Similar Papers
Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks
Sound
Tricks voice translators to say wrong things.
Are Deep Speech Denoising Models Robust to Adversarial Noise?
Sound
Makes voice helpers hear wrong words on purpose.
Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks
Sound
Makes voice assistants hear wrong words easily.