Description and Comparative Analysis of QuRE: A New Industrial Requirements Quality Dataset
By: Henning Femmer , Frank Houdek , Max Unterbusch and more
Potential Business Impact:
Helps build better computer programs by checking instructions.
Requirements quality is central to successful software and systems engineering. Empirical research on quality defects in natural language requirements relies heavily on datasets, ideally as realistic and representative as possible. However, such datasets are often inaccessible, small, or lack sufficient detail. This paper introduces QuRE (Quality in Requirements), a new dataset comprising 2,111 industrial requirements that have been annotated through a real-world review process. Previously used for over five years as part of an industrial contract, this dataset is now being released to the research community. In this work, we furthermore provide descriptive statistics on the dataset, including measures such as lexical diversity and readability, and compare it to existing requirements datasets and synthetically generated requirements. In contrast to synthetic datasets, QuRE is linguistically similar to existing ones. However, this dataset comes with a detailed context description, and its labels have been created and used systematically and extensively in an industrial context over a period of close to a decade. Our goal is to foster transparency, comparability, and empirical rigor by supporting the development of a common gold standard for requirements quality datasets. This, in turn, will enable more sound and collaborative research efforts in the field.
Similar Papers
SQuaD: The Software Quality Dataset
Software Engineering
Helps find software problems faster and earlier.
SynQuE: Estimating Synthetic Dataset Quality Without Annotations
Machine Learning (CS)
Chooses best fake data for computers to learn.
Light over Heavy: Automated Performance Requirements Quantification with Linguistic Inducement
Software Engineering
Finds computer problems faster and cheaper.