Score: 2

WildCode: An Empirical Analysis of Code Generated by ChatGPT

Published: December 3, 2025 | arXiv ID: 2512.04259v1

By: Kobra Khanmohammadi , Pooria Roy , Raphael Khoury and more

Potential Business Impact:

AI-written code is often unsafe for computers.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

LLM models are increasingly used to generate code, but the quality and security of this code are often uncertain. Several recent studies have raised alarm bells, indicating that such AI-generated code may be particularly vulnerable to cyberattacks. However, most of these studies rely on code that is generated specifically for the study, which raises questions about the realism of such experiments. In this study, we perform a large-scale empirical analysis of real-life code generated by ChatGPT. We evaluate code generated by ChatGPT both with respect to correctness and security and delve into the intentions of users who request code from the model. Our research confirms previous studies that used synthetic queries and yielded evidence that LLM-generated code is often inadequate with respect to security. We also find that users exhibit little curiosity about the security features of the code they ask LLMs to generate, as evidenced by their lack of queries on this topic.

Country of Origin
🇨🇦 Canada


Page Count
21 pages

Category
Computer Science:
Cryptography and Security