Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs
By: Jen-tse Huang , Jiantong Qin , Xueli Qiu and more
Potential Business Impact:
Helps AI learn what people truly value.
Value alignment is central to the development of safe and socially compatible artificial intelligence. However, how Large Language Models (LLMs) represent and enact human values in real-world decision contexts remains under-explored. We present ValAct-15k, a dataset of 3,000 advice-seeking scenarios derived from Reddit, designed to elicit ten values defined by Schwartz Theory of Basic Human Values. Using both the scenario-based questions and the traditional value questionnaire, we evaluate ten frontier LLMs (five from U.S. companies, five from Chinese ones) and human participants ($n = 55$). We find near-perfect cross-model consistency in scenario-based decisions (Pearson $r \approx 1.0$), contrasting sharply with the broad variability observed among humans ($r \in [-0.79, 0.98]$). Yet, both humans and LLMs show weak correspondence between self-reported and enacted values ($r = 0.4, 0.3$), revealing a systematic knowledge-action gap. When instructed to "hold" a specific value, LLMs' performance declines up to $6.6%$ compared to merely selecting the value, indicating a role-play aversion. These findings suggest that while alignment training yields normative value convergence, it does not eliminate the human-like incoherence between knowing and acting upon values.
Similar Papers
The Pluralistic Moral Gap: Understanding Judgment and Value Differences between Humans and Large Language Models
Computation and Language
Helps AI give better moral advice like people.
Diverse Human Value Alignment for Large Language Models via Ethical Reasoning
Artificial Intelligence
Teaches AI to understand different cultures' rules.
Are We Aligned? A Preliminary Investigation of the Alignment of Responsible AI Values between LLMs and Human Judgment
Software Engineering
AI tools sometimes don't follow the rules they say they do.