In Silico Development of Psychometric Scales: Feasibility of Representative Population Data Simulation with LLMs
By: Enrico Cipriani , Pavel Okopnyi , Danilo Menicucci and more
Potential Business Impact:
Lets computers create fake people for testing.
Developing and validating psychometric scales requires large samples, multiple testing phases, and substantial resources. Recent advances in Large Language Models (LLMs) enable the generation of synthetic participant data by prompting models to answer items while impersonating individuals of specific demographic profiles, potentially allowing in silico piloting before real data collection. Across four preregistered studies (N = circa 300 each), we tested whether LLM-simulated datasets can reproduce the latent structures and measurement properties of human responses. In Studies 1-2, we compared LLM-generated data with real datasets for two validated scales; in Studies 3-4, we created new scales using EFA on simulated data and then examined whether these structures generalized to newly collected human samples. Simulated datasets replicated the intended factor structures in three of four studies and showed consistent configural and metric invariance, with scalar invariance achieved for the two newly developed scales. However, correlation-based tests revealed substantial differences between real and synthetic datasets, and notable discrepancies appeared in score distributions and variances. Thus, while LLMs capture group-level latent structures, they do not approximate individual-level data properties. Simulated datasets also showed full internal invariance across gender. Overall, LLM-generated data appear useful for early-stage, group-level psychometric prototyping, but not as substitutes for individual-level validation. We discuss methodological limitations, risks of bias and data pollution, and ethical considerations related to in silico psychometric simulations.
Similar Papers
Scaling Law in LLM Simulated Personality: More Detailed and Realistic Persona Profile Is All You Need
Computers and Society
Computers can now pretend to be people.
From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
Artificial Intelligence
Computers guess your personality from a few answers.
Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?
Computation and Language
Computers can't yet help make tests better.