Detecting Prompt Injection Attacks Against Application Using Classifiers
By: Safwan Shaheer , G. M. Refatul Islam , Mohammad Rafid Hamid and more
Potential Business Impact:
Stops bad instructions from breaking computer programs.
Prompt injection attacks can compromise the security and stability of critical systems, from infrastructure to large web applications. This work curates and augments a prompt injection dataset based on the HackAPrompt Playground Submissions corpus and trains several classifiers, including LSTM, feed forward neural networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integrated web applications. The proposed approach improves prompt injection detection and mitigation, helping protect targeted applications and systems.
Similar Papers
Privacy-Preserving Prompt Injection Detection for LLMs Using Federated Learning and Embedding-Based NLP Classification
Cryptography and Security
Protects smart computers from being tricked safely.
Cybersecurity AI: Hacking the AI Hackers via Prompt Injection
Cryptography and Security
Hackers can trick AI security tools.
Multimodal Prompt Injection Attacks: Risks and Defenses for Modern LLMs
Cryptography and Security
Finds ways AI can be tricked.