A Reproducible Framework for Neural Topic Modeling in Focus Group Analysis
By: Heger Arfaoui , Mohammed Iheb Hergli , Beya Benzina and more
Potential Business Impact:
Analyzes group talks faster using smart computer programs.
Focus group discussions generate rich qualitative data but their analysis traditionally relies on labor-intensive manual coding that limits scalability and reproducibility. We present a rigorous, reproducible computational framework for applying neural topic modeling to focus group transcripts, addressing fundamental methodological challenges: hyperparameter sensitivity, model stability, and validation of interpretability. Using BERTopic applied to ten focus groups exploring HPV vaccine perceptions in Tunisia (1,076 utterances), we conducted systematic evaluation across 27 hyperparameter configurations, assessed stability through bootstrap resampling with 30 replicates per configuration, and validated interpretability through formal human evaluation by three domain experts. Our analysis demonstrates substantial sensitivity to hyperparameter choices and reveals that metric selection for stability assessment must align with analytical goals. A hierarchical merging strategy (extracting fine-grained topics for stability then consolidating for interpretability) effectively navigates the stability-coherence tradeoff, achieving coherence of 0.558 compared to 0.539 for direct extraction. Human validation confirmed topic quality with very good inter-rater reliability (ICC = 0.79, weighted Cohen's kappa = 0.578). Our framework provides practical guidelines that researchers can adapt to their own qualitative research contexts. All code, data processing scripts, and evaluation protocols are publicly available to support reproduction and extension of this work.
Similar Papers
A Reproducible Framework for Neural Topic Modeling in Focus Group Analysis
Computation and Language
Analyzes group talks faster, finding hidden ideas.
Automating Historical Insight Extraction from Large-Scale Newspaper Archives via Neural Topic Modeling
Computation and Language
Finds hidden stories in old newspapers.
A Reproducible, Scalable Pipeline for Synthesizing Autoregressive Model Literature
Information Retrieval
Automates finding and re-running AI research.