Score: 1

DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations

Published: January 2, 2026 | arXiv ID: 2601.00623v1

By: Longtian Qiu , Shan Ning , Chuyu Zhang and more

Potential Business Impact:

Teaches AI to avoid making up fake answers.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Direct Preference Optimization (DPO) has shown strong potential for mitigating hallucinations in Multimodal Large Language Models (MLLMs). However, existing multimodal DPO approaches often suffer from overfitting due to the difficulty imbalance in preference data. Our analysis shows that MLLMs tend to overemphasize easily distinguishable preference pairs, which hinders fine-grained hallucination suppression and degrades overall performance. To address this issue, we propose Difficulty-Aware Direct Preference Optimization (DA-DPO), a cost-effective framework designed to balance the learning process. DA-DPO consists of two main components: (1) Difficulty Estimation leverages pre-trained vision--language models with complementary generative and contrastive objectives, whose outputs are integrated via a distribution-aware voting strategy to produce robust difficulty scores without additional training; and (2) Difficulty-Aware Training reweights preference pairs based on their estimated difficulty, down-weighting easy samples while emphasizing harder ones to alleviate overfitting. This framework enables more effective preference optimization by prioritizing challenging examples, without requiring new data or extra fine-tuning stages. Extensive experiments demonstrate that DA-DPO consistently improves multimodal preference optimization, yielding stronger robustness to hallucinations and better generalization across standard benchmarks, while remaining computationally efficient. The project page is available at https://artanic30.github.io/project_pages/DA-DPO/.

Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

Artificial Intelligence

Teaches AI to describe pictures without making things up.

13 Jun 2025 0

92%

Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning

Computation and Language

Makes AI tell the truth, not make things up.

6 Jan 2026 0

92%

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

CV and Pattern Recognition

Makes AI less likely to make up fake answers.

16 Jan 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

21 pages

DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations

Teaches AI to avoid making up fake answers.

Technical Abstract

Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key