Good News for Script Kiddies? Evaluating Large Language Models for Automated Exploit Generation
By: David Jin, Qian Fu, Yuekang Li
Potential Business Impact:
AI can write code to break computer programs.
Large Language Models (LLMs) have demonstrated remarkable capabilities in code-related tasks, raising concerns about their potential for automated exploit generation (AEG). This paper presents the first systematic study on LLMs' effectiveness in AEG, evaluating both their cooperativeness and technical proficiency. To mitigate dataset bias, we introduce a benchmark with refactored versions of five software security labs. Additionally, we design an LLM-based attacker to systematically prompt LLMs for exploit generation. Our experiments reveal that GPT-4 and GPT-4o exhibit high cooperativeness, comparable to uncensored models, while Llama3 is the most resistant. However, no model successfully generates exploits for refactored labs, though GPT-4o's minimal errors highlight the potential for LLM-driven AEG advancements.
Similar Papers
Prompt to Pwn: Automated Exploit Generation for Smart Contracts
Cryptography and Security
Finds software bugs automatically to prevent hacks.
Large Language Models for Multilingual Vulnerability Detection: How Far Are We?
Software Engineering
Finds hidden computer bugs in many languages.
LLM-GUARD: Large Language Model-Based Detection and Repair of Bugs and Security Vulnerabilities in C++ and Python
Software Engineering
Finds simple and some tricky computer bugs.