RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation
By: Vasilije Stambolic, Aritra Dhar, Lukas Cavigelli
Potential Business Impact:
Hackers trick AI into running bad code.
Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by eliminating the need for model retraining. It does so by adding external data into the LLM's context. We develop a new class of black-box attack, RAG-Pull, that inserts hidden UTF characters into queries or external code repositories, redirecting retrieval toward malicious code, thereby breaking the models' safety alignment. We observe that query and code perturbations alone can shift retrieval toward attacker-controlled snippets, while combined query-and-target perturbations achieve near-perfect success. Once retrieved, these snippets introduce exploitable vulnerabilities such as remote code execution and SQL injection. RAG-Pull's minimal perturbations can alter the model's safety alignment and increase preference towards unsafe code, therefore opening up a new class of attacks on LLMs.
Similar Papers
Secure Retrieval-Augmented Generation against Poisoning Attacks
Cryptography and Security
Stops bad info from tricking smart computer programs.
Secure Retrieval-Augmented Generation against Poisoning Attacks
Cryptography and Security
Stops bad info from tricking smart computer programs.
Adapting Large Language Models to Emerging Cybersecurity using Retrieval Augmented Generation
Cryptography and Security
Helps computers spot new cyber threats faster.