Securing Large Language Models (LLMs) from Prompt Injection Attacks
By: Omar Farooq Khan Suri, John McCrae
Potential Business Impact:
Makes AI safer from tricky instructions.
Large Language Models (LLMs) are increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection attacks. These attacks leverage the model's instruction-following ability to make it perform malicious tasks. Recent work has proposed JATMO, a task-specific fine-tuning approach that trains non-instruction-tuned base models to perform a single function, thereby reducing susceptibility to adversarial instructions. In this study, we evaluate the robustness of JATMO against HOUYI, a genetic attack framework that systematically mutates and optimizes adversarial prompts. We adapt HOUYI by introducing custom fitness scoring, modified mutation logic, and a new harness for local model testing, enabling a more accurate assessment of defense effectiveness. We fine-tuned LLaMA 2-7B, Qwen1.5-4B, and Qwen1.5-0.5B models under the JATMO methodology and compared them with a fine-tuned GPT-3.5-Turbo baseline. Results show that while JATMO reduces attack success rates relative to instruction-tuned models, it does not fully prevent injections; adversaries exploiting multilingual cues or code-related disruptors still bypass defenses. We also observe a trade-off between generation quality and injection vulnerability, suggesting that better task performance often correlates with increased susceptibility. Our results highlight both the promise and limitations of fine-tuning-based defenses and point toward the need for layered, adversarially informed mitigation strategies.
Similar Papers
Multimodal Prompt Injection Attacks: Risks and Defenses for Modern LLMs
Cryptography and Security
Finds ways AI can be tricked.
Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models
Cryptography and Security
Makes AI safer from tricky instructions.
Exploiting Web Search Tools of AI Agents for Data Exfiltration
Cryptography and Security
Protects smart computer brains from being tricked.