Score: 1

Reverse Engineering User Stories from Code using Large Language Models

Published: September 23, 2025 | arXiv ID: 2509.19587v1

By: Mohamed Ouf , Haoyu Li , Michael Zhang and more

Potential Business Impact:

Helps computers understand old code by writing stories.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

User stories are essential in agile development, yet often missing or outdated in legacy and poorly documented systems. We investigate whether large language models (LLMs) can automatically recover user stories directly from source code and how prompt design impacts output quality. Using 1,750 annotated C++ snippets of varying complexity, we evaluate five state-of-the-art LLMs across six prompting strategies. Results show that all models achieve, on average, an F1 score of 0.8 for code up to 200 NLOC. Our findings show that a single illustrative example enables the smallest model (8B) to match the performance of a much larger 70B model. In contrast, structured reasoning via Chain-of-Thought offers only marginal gains, primarily for larger models.