Reverse Engineering User Stories from Code using Large Language Models
By: Mohamed Ouf , Haoyu Li , Michael Zhang and more
Potential Business Impact:
Helps computers understand old code by writing stories.
User stories are essential in agile development, yet often missing or outdated in legacy and poorly documented systems. We investigate whether large language models (LLMs) can automatically recover user stories directly from source code and how prompt design impacts output quality. Using 1,750 annotated C++ snippets of varying complexity, we evaluate five state-of-the-art LLMs across six prompting strategies. Results show that all models achieve, on average, an F1 score of 0.8 for code up to 200 NLOC. Our findings show that a single illustrative example enables the smallest model (8B) to match the performance of a much larger 70B model. In contrast, structured reasoning via Chain-of-Thought offers only marginal gains, primarily for larger models.
Similar Papers
Leveraging LLMs for User Stories in AI Systems: UStAI Dataset
Software Engineering
Helps AI understand what people need from it.
Large Language Models for Fault Localization: An Empirical Study
Software Engineering
Finds bugs in computer code faster.
A Study on the Improvement of Code Generation Quality Using Large Language Models Leveraging Product Documentation
Software Engineering
Makes apps work right by testing them automatically.