Adaptive Preference Aggregation
By: Benjamin Heymann
Potential Business Impact:
Teaches AI to understand what people want.
AI alignment, the challenge of ensuring AI systems act in accordance with human values, has emerged as a critical problem in the development of systems such as foundation models and recommender systems. Still, the current dominant approach, reinforcement learning with human feedback (RLHF) faces known theoretical limitations in aggregating diverse human preferences. Social choice theory provides a framework to aggregate preferences, but was not developed for the multidimensional applications typical of AI. Leveraging insights from a recently published urn process, this work introduces a preference aggregation strategy that adapts to the user's context and that inherits the good properties of the maximal lottery, a Condorcet-consistent solution concept.
Similar Papers
Maximizing the efficiency of human feedback in AI alignment: a comparative analysis
Human-Computer Interaction
Teaches AI to learn faster from people's choices.
Maximizing the efficiency of human feedback in AI alignment: a comparative analysis
Human-Computer Interaction
Teaches computers to learn what people like faster.
A Systematic Evaluation of Preference Aggregation in Federated RLHF for Pluralistic Alignment of LLMs
Computation and Language
Helps AI learn what many different people like.