Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack
By: Roee Ziv, Raz Lapid, Moshe Sipper
Potential Business Impact:
Makes AI hear wrong things to trick it.
Audio-language models combine audio encoders with large language models to enable multimodal reasoning, but they also introduce new security vulnerabilities. We propose a universal targeted latent space attack, an encoder-level adversarial attack that manipulates audio latent representations to induce attacker-specified outputs in downstream language generation. Unlike prior waveform-level or input-specific attacks, our approach learns a universal perturbation that generalizes across inputs and speakers and does not require access to the language model. Experiments on Qwen2-Audio-7B-Instruct demonstrate consistently high attack success rates with minimal perceptual distortion, revealing a critical and previously underexplored attack surface at the encoder level of multimodal systems.
Similar Papers
Backdoor Attacks Against Speech Language Models
Computation and Language
Makes AI that understands speech easier to trick.
Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World
Cryptography and Security
Makes voice assistants unsafe to use.
Multilingual and Multi-Accent Jailbreaking of Audio LLMs
Sound
Makes AI understand bad audio from many languages.