Large Language Models in Thematic Analysis: Prompt Engineering, Evaluation, and Guidelines for Qualitative Software Engineering Research
By: Cristina Martinez Montes , Robert Feldt , Cristina Miguel Martos and more
Potential Business Impact:
Helps computers find patterns in people's words.
As artificial intelligence advances, large language models (LLMs) are entering qualitative research workflows, yet no reproducible methods exist for integrating them into established approaches like thematic analysis (TA), one of the most common qualitative methods in software engineering research. Moreover, existing studies lack systematic evaluation of LLM-generated qualitative outputs against established quality criteria. We designed and iteratively refined prompts for Phases 2-5 of Braun and Clarke's reflexive TA, then tested outputs from multiple LLMs against codes and themes produced by experienced researchers. Using 15 interviews on software engineers' well-being, we conducted blind evaluations with four expert evaluators who applied rubrics derived directly from Braun and Clarke's quality criteria. Evaluators preferred LLM-generated codes 61% of the time, finding them analytically useful for answering the research question. However, evaluators also identified limitations: LLMs fragmented data unnecessarily, missed latent interpretations, and sometimes produced themes with unclear boundaries. Our contributions are threefold. First, a reproducible approach integrating refined, documented prompts with an evaluation framework to operationalize Braun and Clarke's reflexive TA. Second, an empirical comparison of LLM- and human-generated codes and themes in software engineering data. Third, guidelines for integrating LLMs into qualitative analysis while preserving methodological rigour, clarifying when and how LLMs can assist effectively and when human interpretation remains essential.
Similar Papers
LLM-Assisted Thematic Analysis: Opportunities, Limitations, and Recommendations
Software Engineering
Helps researchers analyze text faster, but needs human checks.
Automated Thematic Analyses Using LLMs: Xylazine Wound Management Social Media Chatter Use Case
Artificial Intelligence
Computers find patterns in online talks.
On the Use of Large Language Models for Qualitative Synthesis
Software Engineering
Helps doctors organize medical research faster.