Wild SBOMs: a Large-scale Dataset of Software Bills of Materials from Public Code
By: Luıs Soeiro, Thomas Robert, Stefano Zacchiroli
Potential Business Impact:
Helps software builders track code parts safely.
Developers gain productivity by reusing readily available Free and Open Source Software (FOSS) components. Such practices also bring some difficulties, such as managing licensing, components and related security. One approach to handle those difficulties is to use Software Bill of Materials (SBOMs). While there have been studies on the readiness of practitioners to embrace SBOMs and on the SBOM tools ecosystem, a large scale study on SBOM practices based on SBOM files produced in the wild is still lacking. A starting point for such a study is a large dataset of SBOM files found in the wild. We introduce such a dataset, consisting of over 78 thousand unique SBOM files, deduplicated from those found in over 94 million repositories. We include metadata that contains the standard and format used, quality score generated by the tool sbomqs, number of revisions, filenames and provenance information. Finally, we give suggestions and examples of research that could bring new insights on assessing and improving SBOM real practices.
Similar Papers
A Dataset of Software Bill of Materials for Evaluating SBOM Consumption Tools
Software Engineering
Helps find hidden problems in computer code.
Policy-driven Software Bill of Materials on GitHub: An Empirical Study
Software Engineering
Finds security problems in computer code.
Augmenting Software Bills of Materials with Software Vulnerability Description: A Preliminary Study on GitHub
Software Engineering
Finds software problems before they cause trouble.