Comment Traps: How Defective Commented-out Code Augment Defects in AI-Assisted Code Generation
By: Yuan Huang , Yukang Zhou , Xiangping Chen and more
With the rapid development of large language models in code generation, AI-powered editors such as GitHub Copilot and Cursor are revolutionizing software development practices. At the same time, studies have identified potential defects in the generated code. Previous research has predominantly examined how code context influences the generation of defective code, often overlooking the impact of defects within commented-out code (CO code). AI coding assistants' interpretation of CO code in prompts affects the code they generate. This study evaluates how AI coding assistants, GitHub Copilot and Cursor, are influenced by defective CO code. The experimental results show that defective CO code in the context causes AI coding assistants to generate more defective code, reaching up to 58.17 percent. Our findings further demonstrate that the tools do not simply copy the defective code from the context. Instead, they actively reason to complete incomplete defect patterns and continue to produce defective code despite distractions such as incorrect indentation or tags. Even with explicit instructions to ignore the defective CO code, the reduction in defects does not exceed 21.84 percent. These findings underscore the need for improved robustness and security measures in AI coding assistants.
Similar Papers
Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability
Software Engineering
AI helps programmers code faster, not worse.
GitHub's Copilot Code Review: Can AI Spot Security Flaws Before You Commit?
Software Engineering
AI code checker misses big security problems.
Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity
Software Engineering
AI code has more security flaws than human code.