Are LLM Agents Behaviorally Coherent? Latent Profiles for Social Simulation
By: James Mooney , Josef Woldense , Zheng Robert Jia and more
Potential Business Impact:
AI can't reliably act like people in studies.
The impressive capabilities of Large Language Models (LLMs) have fueled the notion that synthetic agents can serve as substitutes for real participants in human-subject research. In an effort to evaluate the merits of this claim, social science researchers have largely focused on whether LLM-generated survey data corresponds to that of a human counterpart whom the LLM is prompted to represent. In contrast, we address a more fundamental question: Do agents maintain internal consistency, retaining similar behaviors when examined under different experimental settings? To this end, we develop a study designed to (a) reveal the agent's internal state and (b) examine agent behavior in a basic dialogue setting. This design enables us to explore a set of behavioral hypotheses to assess whether an agent's conversation behavior is consistent with what we would expect from their revealed internal state. Our findings on these hypotheses show significant internal inconsistencies in LLMs across model families and at differing model sizes. Most importantly, we find that, although agents may generate responses matching those of their human counterparts, they fail to be internally consistent, representing a critical gap in their capabilities to accurately substitute for real participants in human-subject research. Our simulation code and data are publicly accessible.
Similar Papers
Simulating Online Social Media Conversations on Controversial Topics Using AI Agents Calibrated on Real-World Data
Social and Information Networks
Computers can now pretend to be people online.
Social Simulations with Large Language Model Risk Utopian Illusion
Computation and Language
Computers show fake, too-nice people in chats.
Can LLMs Generate Behaviors for Embodied Virtual Agents Based on Personality Traits?
Human-Computer Interaction
Makes computer characters act like real people.