Multimodal Prompt Injection Attacks: Risks and Defenses for Modern LLMs
By: Andrew Yeo, Daeseon Choi
Potential Business Impact:
Finds ways AI can be tricked.
Large Language Models (LLMs) have seen rapid adoption in recent years, with industries increasingly relying on them to maintain a competitive advantage. These models excel at interpreting user instructions and generating human-like responses, leading to their integration across diverse domains, including consulting and information retrieval. However, their widespread deployment also introduces substantial security risks, most notably in the form of prompt injection and jailbreak attacks. To systematically evaluate LLM vulnerabilities -- particularly to external prompt injection -- we conducted a series of experiments on eight commercial models. Each model was tested without supplementary sanitization, relying solely on its built-in safeguards. The results exposed exploitable weaknesses and emphasized the need for stronger security measures. Four categories of attacks were examined: direct injection, indirect (external) injection, image-based injection, and prompt leakage. Comparative analysis indicated that Claude 3 demonstrated relatively greater robustness; nevertheless, empirical findings confirm that additional defenses, such as input normalization, remain necessary to achieve reliable protection.
Similar Papers
Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions
Cryptography and Security
Computers can be tricked by hidden instructions.
Prompt-in-Content Attacks: Exploiting Uploaded Inputs to Hijack LLM Behavior
Cryptography and Security
Hides bad instructions in text to trick AI.
Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems
Cryptography and Security
Stops bad guys from stealing secrets from smart computer programs.