WildCode: An Empirical Analysis of Code Generated by ChatGPT
By: Kobra Khanmohammadi , Pooria Roy , Raphael Khoury and more
Potential Business Impact:
AI-written code is often unsafe for computers.
LLM models are increasingly used to generate code, but the quality and security of this code are often uncertain. Several recent studies have raised alarm bells, indicating that such AI-generated code may be particularly vulnerable to cyberattacks. However, most of these studies rely on code that is generated specifically for the study, which raises questions about the realism of such experiments. In this study, we perform a large-scale empirical analysis of real-life code generated by ChatGPT. We evaluate code generated by ChatGPT both with respect to correctness and security and delve into the intentions of users who request code from the model. Our research confirms previous studies that used synthetic queries and yielded evidence that LLM-generated code is often inadequate with respect to security. We also find that users exhibit little curiosity about the security features of the code they ask LLMs to generate, as evidenced by their lack of queries on this topic.
Similar Papers
The Hidden Risks of LLM-Generated Web Application Code: A Security-Centric Evaluation of Code Generation Capabilities in Large Language Models
Cryptography and Security
Finds security flaws in computer code made by AI.
Using LLMs for Security Advisory Investigations: How Far Are We?
Cryptography and Security
Helps computers write security warnings, but can be fooled.
LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models
Artificial Intelligence
Finds security problems in computer code made by AI.