Impact of Comments on LLM Comprehension of Legacy Code
By: Rock Sabetto , Emily Escamilla , Devesh Agarwal and more
Potential Business Impact:
Helps computers understand old computer code.
Large language models (LLMs) have been increasingly integrated into software engineering and maintenance tasks due to their high performance with software engineering tasks and robust understanding of modern programming languages. However, the ability of LLMs to comprehend code written with legacy languages remains a research gap challenged by real-world legacy systems lacking or containing inaccurate documentation that may impact LLM comprehension. To assess LLM comprehension of legacy languages, there is a need for objective LLM evaluation. In order to objectively measure LLM comprehension of legacy languages, we need an efficient, quantitative evaluation method. We leverage multiple-choice question answering (MCQA), an emerging LLM evaluation methodology, to evaluate LLM comprehension of legacy code and the impact of comment prevalence and inaccurate comments. In this work, we present preliminary findings on the impact of documentation on LLM comprehension of legacy code and outline strategic objectives for future work.
Similar Papers
How Accurately Do Large Language Models Understand Code?
Software Engineering
Tests if computers truly understand code.
"I Would Have Written My Code Differently'': Beginners Struggle to Understand LLM-Generated Code
Software Engineering
Helps new coders understand computer-written code.
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
Software Engineering
Makes computer code easier to understand and write.