Manipulating Multimodal Agents via Cross-Modal Prompt Injection
By: Le Wang , Zonghao Ying , Tianyuan Zhang and more
Potential Business Impact:
Tricks smart AI into doing bad things.
The emergence of multimodal large language models has redefined the agent paradigm by integrating language and vision modalities with external data sources, enabling agents to better interpret human instructions and execute increasingly complex tasks. However, in this paper, we identify a critical yet previously overlooked security vulnerability in multimodal agents: cross-modal prompt injection attacks. To exploit this vulnerability, we propose CrossInject, a novel attack framework in which attackers embed adversarial perturbations across multiple modalities to align with target malicious content, allowing external instructions to hijack the agent's decision-making process and execute unauthorized tasks. Our approach incorporates two key coordinated components. First, we introduce Visual Latent Alignment, where we optimize adversarial features to the malicious instructions in the visual embedding space based on a text-to-image generative model, ensuring that adversarial images subtly encode cues for malicious task execution. Subsequently, we present Textual Guidance Enhancement, where a large language model is leveraged to construct the black-box defensive system prompt through adversarial meta prompting and generate an malicious textual command that steers the agent's output toward better compliance with attackers' requests. Extensive experiments demonstrate that our method outperforms state-of-the-art attacks, achieving at least a +30.1% increase in attack success rates across diverse tasks. Furthermore, we validate our attack's effectiveness in real-world multimodal autonomous agents, highlighting its potential implications for safety-critical applications.
Similar Papers
Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks
Cryptography and Security
Protects smart AI from bad instructions.
WebInject: Prompt Injection Attack to Web Agents
Machine Learning (CS)
Makes websites trick robots into doing bad things.
Multimodal Prompt Injection Attacks: Risks and Defenses for Modern LLMs
Cryptography and Security
Finds ways AI can be tricked.