Score: 0

Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics

Published: August 12, 2025 | arXiv ID: 2508.08661v1

By: Chunhua Liu, Hong Yi Lin, Patanamon Thongtanunam

Potential Business Impact:

Finds mistakes in computer code writing.

Language models have shown strong capabilities across a wide range of tasks in software engineering, such as code generation, yet they suffer from hallucinations. While hallucinations have been studied independently in natural language and code generation, their occurrence in tasks involving code changes which have a structurally complex and context-dependent format of code remains largely unexplored. This paper presents the first comprehensive analysis of hallucinations in two critical tasks involving code change to natural language generation: commit message generation and code review comment generation. We quantify the prevalence of hallucinations in recent language models and explore a range of metric-based approaches to automatically detect them. Our findings reveal that approximately 50\% of generated code reviews and 20\% of generated commit messages contain hallucinations. Whilst commonly used metrics are weak detectors on their own, combining multiple metrics substantially improves performance. Notably, model confidence and feature attribution metrics effectively contribute to hallucination detection, showing promise for inference-time detection.\footnote{All code and data will be released upon acceptance.

A Systematic Literature Review of Code Hallucinations in LLMs: Characterization, Mitigation Methods, Challenges, and Future Directions for Reliable AI

Software Engineering

Fixes computer code mistakes made by AI.

2 Nov 2025 0

92%

Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges

Software Engineering

Finds and fixes mistakes in computer code.

29 Apr 2025 0

91%

Hallucination in LLM-Based Code Generation: An Automotive Case Study

Software Engineering

Helps computers write car software correctly.

15 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇦🇺 Australia

Page Count

22 pages

Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics

Finds mistakes in computer code writing.

Technical Abstract

A Systematic Literature Review of Code Hallucinations in LLMs: Characterization, Mitigation Methods, Challenges, and Future Directions for Reliable AI

Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges

Hallucination in LLM-Based Code Generation: An Automotive Case Study