Score: 0

UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality

Published: March 10, 2025 | arXiv ID: 2503.10669v2

By: Zelei Cheng , Xin-Qiang Cai , Yuting Tang and more

Potential Business Impact:

Teaches AI to better understand what people want.

Business Areas:

Collaborative Consumption Collaboration

Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone for aligning large language models (LLMs) with human values. However, existing approaches struggle to capture the multi-dimensional, distributional nuances of human preferences. Methods such as RiC that directly inject raw reward values into prompts face significant numerical sensitivity issues--for instance, LLMs may fail to distinguish between 9.11 and 9.8--while alternatives like MORLHF, Rewarded Soups, and MODPO incur high computational costs by training multiple models. In this work, we introduce Utility-Conditioned Multi-Objective Alignment (UC-MOA), a novel framework that overcomes these limitations. Our approach leverages a diverse set of strictly increasing, non-linear utility functions to transform user-specified preferences into symbolic tokens, which are then used to condition a single LLM. This design not only mitigates numerical reasoning challenges but also substantially reduces training overhead, yielding models that achieve superior Pareto fronts and robust alignment across complex reward dimensions.