ChatGPT is not A Man but Das Man: Representativeness and Structural Consistency of Silicon Samples Generated by Large Language Models
By: Dai Li, Linzhuo Li, Huilian Sophie Qiu
Potential Business Impact:
AI chatbots don't show real people's opinions.
Large language models (LLMs) in the form of chatbots like ChatGPT and Llama are increasingly proposed as "silicon samples" for simulating human opinions. This study examines this notion, arguing that LLMs may misrepresent population-level opinions. We identify two fundamental challenges: a failure in structural consistency, where response accuracy doesn't hold across demographic aggregation levels, and homogenization, an underrepresentation of minority opinions. To investigate these, we prompted ChatGPT (GPT-4) and Meta's Llama 3.1 series (8B, 70B, 405B) with questions on abortion and unauthorized immigration from the American National Election Studies (ANES) 2020. Our findings reveal significant structural inconsistencies and severe homogenization in LLM responses compared to human data. We propose an "accuracy-optimization hypothesis," suggesting homogenization stems from prioritizing modal responses. These issues challenge the validity of using LLMs, especially chatbots AI, as direct substitutes for human survey data, potentially reinforcing stereotypes and misinforming policy.
Similar Papers
Who are you, ChatGPT? Personality and Demographic Style in LLM-Generated Content
Computation and Language
Computers show personalities like people.
Who Has The Final Say? Conformity Dynamics in ChatGPT's Selections
Artificial Intelligence
AI copies others' opinions, not always thinking alone.
ChatGPT or A Silent Everywhere Helper: A Survey of Large Language Models
Computation and Language
Lets computers talk and write like people.