CtrlRAG: Black-box Adversarial Attacks Based on Masked Language Models in Retrieval-Augmented Language Generation
By: Runqi Sui
Potential Business Impact:
Makes AI say wrong things by tricking its memory.
Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by integrating external knowledge bases. However, this integration introduces a new security threat: adversaries can exploit the retrieval mechanism to inject malicious content into the knowledge base, thereby influencing the generated responses. Based on this attack vector, we propose CtrlRAG, a novel attack method designed for RAG system in the black-box setting, which aligns with real-world scenarios. Unlike existing attack methods, CtrlRAG introduces a perturbation mechanism using Masked Language Model (MLM) to dynamically optimize malicious content in response to changes in the retrieved context. Experimental results demonstrate that CtrlRAG outperforms three baseline methods in both Emotional Manipulation and Hallucination Amplification objectives. Furthermore, we evaluate three existing defense mechanisms, revealing their limited effectiveness against CtrlRAG and underscoring the urgent need for more robust defenses.
Similar Papers
TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation
Computation and Language
Keeps AI answers honest by removing bad info.
RAG Security and Privacy: Formalizing the Threat Model and Attack Surface
Cryptography and Security
Protects secret information from AI that reads documents.
Secure Retrieval-Augmented Generation against Poisoning Attacks
Cryptography and Security
Stops bad info from tricking smart computer programs.