AMaPO: Adaptive Margin-attached Preference Optimization for Language Model Alignment
By: Ruibo Deng, Duanyu Feng, Wenqiang Lei
Potential Business Impact:
Teaches AI to learn better from ranked choices.
Offline preference optimization offers a simpler and more stable alternative to RLHF for aligning language models. However, their effectiveness is critically dependent on ranking accuracy, a metric where further gains are highly impactful. This limitation arises from a fundamental problem that we identify and formalize as the Overfitting-Underfitting Dilemma: current margin designs cause models to apply excessive, wasteful gradients to correctly ranked samples (overfitting) while providing insufficient corrective signals for misranked ones (underfitting). To resolve this dilemma, we propose Adaptive Margin-attached Preference Optimization (AMaPO), a simple yet principled algorithm. AMaPO employs an instance-wise adaptive margin, refined by Z-normalization and exponential scaling, which dynamically reallocates learning effort by amplifying gradients for misranked samples and suppressing them for correct ones. Extensive experiments on widely used benchmarks demonstrate that AMaPO not only achieves better ranking accuracy and superior downstream alignment performance, but targeted analysis also confirms that it successfully mitigates the core overfitting and underfitting issues.
Similar Papers
AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models
Machine Learning (CS)
Makes AI better at many things at once.
Offline Preference Optimization via Maximum Marginal Likelihood Estimation
Machine Learning (CS)
Makes AI understand what you like better.
Enhancing Small LLM Alignment through Margin-Based Objective Modifications under Resource Constraints
Computation and Language
Makes small AI understand what people want better.