Score: 1

Description and Comparative Analysis of QuRE: A New Industrial Requirements Quality Dataset

Published: August 12, 2025 | arXiv ID: 2508.08868v1

By: Henning Femmer , Frank Houdek , Max Unterbusch and more

BigTech Affiliations: Mercedes-Benz

Potential Business Impact:

Helps build better computer programs by checking instructions.

Requirements quality is central to successful software and systems engineering. Empirical research on quality defects in natural language requirements relies heavily on datasets, ideally as realistic and representative as possible. However, such datasets are often inaccessible, small, or lack sufficient detail. This paper introduces QuRE (Quality in Requirements), a new dataset comprising 2,111 industrial requirements that have been annotated through a real-world review process. Previously used for over five years as part of an industrial contract, this dataset is now being released to the research community. In this work, we furthermore provide descriptive statistics on the dataset, including measures such as lexical diversity and readability, and compare it to existing requirements datasets and synthetically generated requirements. In contrast to synthetic datasets, QuRE is linguistically similar to existing ones. However, this dataset comes with a detailed context description, and its labels have been created and used systematically and extensively in an industrial context over a period of close to a decade. Our goal is to foster transparency, comparability, and empirical rigor by supporting the development of a common gold standard for requirements quality datasets. This, in turn, will enable more sound and collaborative research efforts in the field.