Automated and Context-Aware Code Documentation Leveraging Advanced LLMs
By: Swapnil Sharma Sarker, Tanzina Taher Ifty
Potential Business Impact:
Writes helpful notes for computer code automatically.
Code documentation is essential to improve software maintainability and comprehension. The tedious nature of manual code documentation has led to much research on automated documentation generation. Existing automated approaches primarily focused on code summarization, leaving a gap in template-based documentation generation (e.g., Javadoc), particularly with publicly available Large Language Models (LLMs). Furthermore, progress in this area has been hindered by the lack of a Javadoc-specific dataset that incorporates modern language features, provides broad framework/library coverage, and includes necessary contextual information. This study aims to address these gaps by developing a tailored dataset and assessing the capabilities of publicly available LLMs for context-aware, template-based Javadoc generation. In this work, we present a novel, context-aware dataset for Javadoc generation that includes critical structural and semantic information from modern Java codebases. We evaluate five open-source LLMs (including LLaMA-3.1, Gemma-2, Phi-3, Mistral, Qwen-2.5) using zero-shot, few-shot, and fine-tuned setups and provide a comparative analysis of their performance. Our results demonstrate that LLaMA 3.1 performs consistently well and is a reliable candidate for practical, automated Javadoc generation, offering a viable alternative to proprietary systems.
Similar Papers
Operationalizing Large Language Models with Design-Aware Contexts for Code Comment Generation
Software Engineering
Helps computers write better explanations for code.
LAURA: Enhancing Code Review Generation with Context-Enriched Retrieval-Augmented LLM
Software Engineering
Helps computers write better code suggestions.
Toward Automated and Trustworthy Scientific Analysis and Visualization with LLM-Generated Code
Software Engineering
AI writes code for scientists' data.