Score: 0

Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?

Published: May 31, 2025 | arXiv ID: 2506.00751v1

By: Zhuojun Gu, Quan Wang, Shuchu Han

Potential Business Impact:

Finds if AI's words match its actions.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Recent advances in Large Language Models (LLMs) highlight the need to align their behaviors with human values. A critical, yet understudied, issue is the potential divergence between an LLM's stated preferences (its reported alignment with general principles) and its revealed preferences (inferred from decisions in contextualized scenarios). Such deviations raise fundamental concerns for the interpretability, trustworthiness, reasoning transparency, and ethical deployment of LLMs, particularly in high-stakes applications. This work formally defines and proposes a method to measure this preference deviation. We investigate how LLMs may activate different guiding principles in specific contexts, leading to choices that diverge from previously stated general principles. Our approach involves crafting a rich dataset of well-designed prompts as a series of forced binary choices and presenting them to LLMs. We compare LLM responses to general principle prompts stated preference with LLM responses to contextualized prompts revealed preference, using metrics like KL divergence to quantify the deviation. We repeat the analysis across different categories of preferences and on four mainstream LLMs and find that a minor change in prompt format can often pivot the preferred choice regardless of the preference categories and LLMs in the test. This prevalent phenomenon highlights the lack of understanding and control of the LLM decision-making competence. Our study will be crucial for integrating LLMs into services, especially those that interact directly with humans, where morality, fairness, and social responsibilities are crucial dimensions. Furthermore, identifying or being aware of such deviation will be critically important as LLMs are increasingly envisioned for autonomous agentic tasks where continuous human evaluation of all LLMs' intermediary decision-making steps is impossible.

A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications

Computation and Language

Teaches AI to be helpful and kind, your way.

21 Mar 2025 0

91%

Implementing Rational Choice Functions with LLMs and Measuring their Alignment with User Preferences

Artificial Intelligence

Helps computers make choices users prefer.

22 Apr 2025 2

90%

Aligning Multimodal LLM with Human Preference: A Survey

CV and Pattern Recognition

Makes AI understand pictures and sounds better.

18 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

19 pages

Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?

Finds if AI's words match its actions.

Technical Abstract

A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications

Implementing Rational Choice Functions with LLMs and Measuring their Alignment with User Preferences

Aligning Multimodal LLM with Human Preference: A Survey