Score: 1

A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks

Published: September 16, 2025 | arXiv ID: 2509.14285v1

By: S M Asif Hossain , Ruksat Khan Shayoni , Mohd Ruhul Ameen and more

Potential Business Impact:

Stops bad instructions from tricking smart computer programs.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Prompt injection attacks represent a major vulnerability in Large Language Model (LLM) deployments, where malicious instructions embedded in user inputs can override system prompts and induce unintended behaviors. This paper presents a novel multi-agent defense framework that employs specialized LLM agents in coordinated pipelines to detect and neutralize prompt injection attacks in real-time. We evaluate our approach using two distinct architectures: a sequential chain-of-agents pipeline and a hierarchical coordinator-based system. Our comprehensive evaluation on 55 unique prompt injection attacks, grouped into 8 categories and totaling 400 attack instances across two LLM platforms (ChatGLM and Llama2), demonstrates significant security improvements. Without defense mechanisms, baseline Attack Success Rates (ASR) reached 30% for ChatGLM and 20% for Llama2. Our multi-agent pipeline achieved 100% mitigation, reducing ASR to 0% across all tested scenarios. The framework demonstrates robustness across multiple attack categories including direct overrides, code execution attempts, data exfiltration, and obfuscation techniques, while maintaining system functionality for legitimate queries.