Text2Stories: Evaluating the Alignment Between Stakeholder Interviews and Generated User Stories
By: Francesco Dente, Fabiano Dalpiaz, Paolo Papotti
Potential Business Impact:
Checks if computer instructions match what people want.
Large language models (LLMs) can be employed for automating the generation of software requirements from natural language inputs such as the transcripts of elicitation interviews. However, evaluating whether those derived requirements faithfully reflect the stakeholders' needs remains a largely manual task. We introduce Text2Stories, a task and metrics for text-to-story alignment that allow quantifying the extent to which requirements (in the form of user stories) match the actual needs expressed by the elicitation session participants. Given an interview transcript and a set of user stories, our metric quantifies (i) correctness: the proportion of stories supported by the transcript, and (ii) completeness: the proportion of transcript supported by at least one story. We segment the transcript into text chunks and instantiate the alignment as a matching problem between chunks and stories. Experiments over four datasets show that an LLM-based matcher achieves 0.86 macro-F1 on held-out annotations, while embedding models alone remain behind but enable effective blocking. Finally, we show how our metrics enable the comparison across sets of stories (e.g., human vs. generated), positioning Text2Stories as a scalable, source-faithful complement to existing user-story quality criteria.
Similar Papers
Leveraging LLMs for User Stories in AI Systems: UStAI Dataset
Software Engineering
Helps AI understand what people need from it.
Reverse Engineering User Stories from Code using Large Language Models
Software Engineering
Helps computers understand old code by writing stories.
Topic-aware Large Language Models for Summarizing the Lived Healthcare Experiences Described in Health Stories
Computers and Society
Helps doctors understand patient stories better.