Multi-Artifact Analysis of Self-Admitted Technical Debt in Scientific Software
By: Eric L. Melin , Nasir U. Eisty , Gregory Watson and more
Potential Business Impact:
Finds hidden problems in science computer code.
Context: Self-admitted technical debt (SATD) occurs when developers acknowledge shortcuts in code. In scientific software (SSW), such debt poses unique risks to the validity and reproducibility of results. Objective: This study aims to identify, categorize, and evaluate scientific debt, a specialized form of SATD in SSW, and assess the extent to which traditional SATD categories capture these domain-specific issues. Method: We conduct a multi-artifact analysis across code comments, commit messages, pull requests, and issue trackers from 23 open-source SSW projects. We construct and validate a curated dataset of scientific debt, develop a multi-source SATD classifier, and conduct a practitioner validation to assess the practical relevance of scientific debt. Results: Our classifier performs strongly across 900,358 artifacts from 23 SSW projects. SATD is most prevalent in pull requests and issue trackers, underscoring the value of multi-artifact analysis. Models trained on traditional SATD often miss scientific debt, emphasizing the need for its explicit detection in SSW. Practitioner validation confirmed that scientific debt is both recognizable and useful in practice. Conclusions: Scientific debt represents a unique form of SATD in SSW that that is not adequately captured by traditional categories and requires specialized identification and management. Our dataset, classification analysis, and practitioner validation results provide the first formal multi-artifact perspective on scientific debt, highlighting the need for tailored SATD detection approaches in SSW.
Similar Papers
Exploring Scientific Debt: Harnessing AI for SATD Identification in Scientific Software
Software Engineering
Finds hidden problems in science computer code.
Self-Admitted Technical Debt in LLM Software: An Empirical Comparison with ML and Non-ML Software
Software Engineering
Finds new coding problems in AI programs.
Self-Admitted Technical Debt in LLM Software: An Empirical Comparison with ML and Non-ML Software
Software Engineering
Finds new coding problems in AI programs.