A Benchmark for Zero-Shot Belief Inference in Large Language Models
By: Joseph Malone , Rachith Aiyappa , Byunghwee Lee and more
Potential Business Impact:
Helps computers understand what people believe.
Beliefs are central to how humans reason, communicate, and form social connections, yet most computational approaches to studying them remain confined to narrow sociopolitical contexts and rely on fine-tuning for optimal performance. Despite the growing use of large language models (LLMs) across disciplines, how well these systems generalize across diverse belief domains remains unclear. We introduce a systematic, reproducible benchmark that evaluates the ability of LLMs to predict individuals' stances on a wide range of topics in a zero-shot setting using data from an online debate platform. The benchmark includes multiple informational conditions that isolate the contribution of demographic context and known prior beliefs to predictive success. Across several small- to medium-sized models, we find that providing more background information about an individual improves predictive accuracy, but performance varies substantially across belief domains. These findings reveal both the capacity and limitations of current LLMs to emulate human reasoning, advancing the study of machine behavior and offering a scalable framework for modeling belief systems beyond the sociopolitical sphere.
Similar Papers
Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge
Artificial Intelligence
Tests if computers understand real-world chances.
Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs
Artificial Intelligence
Makes AI understand "maybe" better for trust.
Reasoning Capabilities and Invariability of Large Language Models
Computation and Language
Tests if computers can think logically.