Score: 0

Selective Masking Adversarial Attack on Automatic Speech Recognition Systems

Published: April 6, 2025 | arXiv ID: 2504.04394v1

By: Zheng Fang , Shenyi Zhang , Tao Wang and more

Potential Business Impact:

Tricks voice assistants to hear only one person.

Business Areas:
Speech Recognition Data and Analytics, Software

Extensive research has shown that Automatic Speech Recognition (ASR) systems are vulnerable to audio adversarial attacks. Current attacks mainly focus on single-source scenarios, ignoring dual-source scenarios where two people are speaking simultaneously. To bridge the gap, we propose a Selective Masking Adversarial attack, namely SMA attack, which ensures that one audio source is selected for recognition while the other audio source is muted in dual-source scenarios. To better adapt to the dual-source scenario, our SMA attack constructs the normal dual-source audio from the muted audio and selected audio. SMA attack initializes the adversarial perturbation with a small Gaussian noise and iteratively optimizes it using a selective masking optimization algorithm. Extensive experiments demonstrate that the SMA attack can generate effective and imperceptible audio adversarial examples in the dual-source scenario, achieving an average success rate of attack of 100% and signal-to-noise ratio of 37.15dB on Conformer-CTC, outperforming the baselines.

Country of Origin
🇨🇳 China

Page Count
6 pages

Category
Computer Science:
Cryptography and Security