An Empirical Study on the Classification of Bug Reports with Machine Learning
By: Renato Andrade , César Teixeira , Nuno Laranjeiro and more
Potential Business Impact:
Helps computers sort out real software problems faster.
Software defects are a major threat to the reliability of computer systems. The literature shows that more than 30% of bug reports submitted in large software projects are misclassified (i.e., are feature requests, or mistakes made by the bug reporter), leading developers to place great effort in manually inspecting them. Machine Learning algorithms can be used for the automatic classification of issue reports. Still, little is known regarding key aspects of training models, such as the influence of programming languages and issue tracking systems. In this paper, we use a dataset containing more than 660,000 issue reports, collected from heterogeneous projects hosted in different issue tracking systems, to study how different factors (e.g., project language, report content) can influence the performance of models in handling classification of issue reports. Results show that using the report title or description does not significantly differ; Support Vector Machine, Logistic Regression, and Random Forest are effective in classifying issue reports; programming languages and issue tracking systems influence classification outcomes; and models based on heterogeneous projects can classify reports from projects not present during training. Based on findings, we propose guidelines for future research, including recommendations for using heterogeneous data and selecting high-performing algorithms.
Similar Papers
Automatic techniques for issue report classification: A systematic mapping study
Software Engineering
Helps sort computer problems automatically.
Automated Bug Report Prioritization in Large Open-Source Projects
Software Engineering
Helps fix computer problems faster by sorting them.
Automated Duplicate Bug Report Detection in Large Open Bug Repositories
Software Engineering
Finds repeated bug reports automatically.