Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency
By: Jan Batzner , Volker Stocker , Bingjun Tang and more
Potential Business Impact:
Makes AI understand people better and more fairly.
Synthetic personae experiments have become a prominent method in Large Language Model alignment research, yet the representativeness and ecological validity of these personae vary considerably between studies. Through a review of 63 peer-reviewed studies published between 2023 and 2025 in leading NLP and AI venues, we reveal a critical gap: task and population of interest are often underspecified in persona-based experiments, despite personalization being fundamentally dependent on these criteria. Our analysis shows substantial differences in user representation, with most studies focusing on limited sociodemographic attributes and only 35% discussing the representativeness of their LLM personae. Based on our findings, we introduce a persona transparency checklist that emphasizes representative sampling, explicit grounding in empirical data, and enhanced ecological validity. Our work provides both a comprehensive assessment of current practices and practical guidelines to improve the rigor and ecological validity of persona-based evaluations in language model alignment research.
Similar Papers
Misalignment of LLM-Generated Personas with Human Perceptions in Low-Resource Settings
Computers and Society
AI personalities don't understand people like real humans.
Population-Aligned Persona Generation for LLM-based Social Simulation
Computation and Language
Creates realistic people for computer simulations.
A Tale of Two Identities: An Ethical Audit of Human and AI-Crafted Personas
Computation and Language
AI creates fake people that sound like stereotypes.