Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding
By: Maciej Skorski, Alina Landowska
Potential Business Impact:
AI spots bad behavior better than most people.
How do large language models understand moral dimensions compared to humans? This first large-scale Bayesian evaluation of market-leading language models provides the answer. In contrast to prior work using deterministic ground truth (majority or inclusion rules), we model annotator disagreements to capture both aleatoric uncertainty (inherent human disagreement) and epistemic uncertainty (model domain sensitivity). We evaluate top language models (Claude Sonnet 4, DeepSeek-V3, Llama 4 Maverick) across 250K+ annotations from ~700 annotators on 100K+ texts spanning social media, news, and forums. Our GPU-optimized Bayesian framework processed 1M+ model queries, revealing that AI models typically rank among the top 25\% of human annotators, achieving much better-than-average balanced accuracy. Importantly, we find that AI produces far fewer false negatives than humans, highlighting their more sensitive moral detection capabilities.
Similar Papers
Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding
Computation and Language
AI spots bad behavior better than most people.
The Morality of Probability: How Implicit Moral Biases in LLMs May Shape the Future of Human-AI Symbiosis
Artificial Intelligence
AI learns to pick "good" choices over "selfish" ones.
Differences in the Moral Foundations of Large Language Models
Computers and Society
Models show different values than people.