Score: 3

AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

Published: June 8, 2025 | arXiv ID: 2506.07165v1

By: Qi Liu , Jingqing Ruan , Hao Li and more

BigTech Affiliations: Meituan

Potential Business Impact:

Makes AI better at many things at once.

Business Areas:

A/B Testing Data and Analytics

Existing multi-objective preference alignment methods for large language models (LLMs) face limitations: (1) the inability to effectively balance various preference dimensions, and (2) reliance on auxiliary reward/reference models introduces computational complexity. To address these challenges, we propose Adaptive Multi-objective Preference Optimization (AMoPO), a novel framework that achieves dynamic balance across preference dimensions. By introducing the multi-objective optimization paradigm to use the dimension-aware generation metrics as implicit rewards, AMoPO aligns LLMs with diverse preferences without additional reward models or reference models. We introduce an adaptive weight assignment mechanism that models the generation space as a Gaussian distribution, allowing dynamic prioritization of preference dimensions. Empirical results demonstrate that AMoPO outperforms state-of-the-art baselines by 28.5%, and the experiments on 7B, 14B, and 32B models reveal the scaling ability of AMoPO. Moreover, additional analysis of multiple dimensions verifies its adaptability and effectiveness. These findings validate AMoPO's capability to achieve dimension-aware preference alignment, highlighting its superiority. Our codes and datasets are available at https://github.com/Javkonline/AMoPO.

AMaPO: Adaptive Margin-attached Preference Optimization for Language Model Alignment

Computation and Language

Teaches AI to learn better from ranked choices.

12 Nov 2025 1

90%

Robust Multi-Objective Preference Alignment with Online DPO

Computation and Language

Lets AI learn many different human wishes.

1 Mar 2025 1

89%

Intelligently Weighting Multiple Reference Models for Direct Preference Optimization of LLMs

Machine Learning (CS)

Makes AI learn better from many examples.

10 Dec 2025 3

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com github.com

Page Count

35 pages

AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

Makes AI better at many things at once.

Technical Abstract

AMaPO: Adaptive Margin-attached Preference Optimization for Language Model Alignment

Robust Multi-Objective Preference Alignment with Online DPO

Intelligently Weighting Multiple Reference Models for Direct Preference Optimization of LLMs