Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks
By: Chang Liu , Haolin Wu , Xi Yang and more
Potential Business Impact:
Tricks voice translators to say wrong things.
As speech translation (ST) systems become increasingly prevalent, understanding their vulnerabilities is crucial for ensuring robust and reliable communication. However, limited work has explored this issue in depth. This paper explores methods of compromising these systems through imperceptible audio manipulations. Specifically, we present two innovative approaches: (1) the injection of perturbation into source audio, and (2) the generation of adversarial music designed to guide targeted translation, while also conducting more practical over-the-air attacks in the physical world. Our experiments reveal that carefully crafted audio perturbations can mislead translation models to produce targeted, harmful outputs, while adversarial music achieve this goal more covertly, exploiting the natural imperceptibility of music. These attacks prove effective across multiple languages and translation models, highlighting a systemic vulnerability in current ST architectures. The implications of this research extend beyond immediate security concerns, shedding light on the interpretability and robustness of neural speech processing systems. Our findings underscore the need for advanced defense mechanisms and more resilient architectures in the realm of audio systems. More details and samples can be found at https://adv-st.github.io.
Similar Papers
Are Modern Speech Enhancement Systems Vulnerable to Adversarial Attacks?
Audio and Speech Processing
Makes voices say different things with hidden sounds.
Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks
Sound
Makes voice assistants hear wrong words easily.
Impact of Phonetics on Speaker Identity in Adversarial Voice Attack
Sound
Makes voice assistants understand speech better.