Score: 1

Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization

Published: April 16, 2025 | arXiv ID: 2504.12083v1

By: Pritam Sarkar, Ali Etemad

Potential Business Impact:

Teaches AI to learn from its video mistakes.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Despite recent advances in Large Video Language Models (LVLMs), they still struggle with fine-grained temporal understanding, hallucinate, and often make simple mistakes on even simple video question-answering tasks, all of which pose significant challenges to their safe and reliable deployment in real-world applications. To address these limitations, we propose a self-alignment framework that enables LVLMs to learn from their own errors. Our proposed framework first obtains a training set of preferred and non-preferred response pairs, where non-preferred responses are generated by incorporating common error patterns that often occur due to inadequate spatio-temporal understanding, spurious correlations between co-occurring concepts, and over-reliance on linguistic cues while neglecting the vision modality, among others. To facilitate self-alignment of LVLMs with the constructed preferred and non-preferred response pairs, we introduce Refined Regularized Preference Optimization (RRPO), a novel preference optimization method that utilizes sub-sequence-level refined rewards and token-wise KL regularization to address the limitations of Direct Preference Optimization (DPO). We demonstrate that RRPO achieves more precise alignment and more stable training compared to DPO. Our experiments and analysis validate the effectiveness of our approach across diverse video tasks, including video hallucination, short- and long-video understanding, and fine-grained temporal reasoning.

LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs

CV and Pattern Recognition

Makes video AI understand what's important.

5 Jun 2025 2

91%

Aligning Large Vision-Language Models by Deep Reinforcement Learning and Direct Preference Optimization

Machine Learning (CS)

Teaches AI to understand pictures and words better.

8 Sep 2025 0

90%

AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization

CV and Pattern Recognition

Protects AI from tricks, keeps answers correct.

2 Apr 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇦 Canada

Page Count

20 pages

Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization

Teaches AI to learn from its video mistakes.

Technical Abstract

LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs

Aligning Large Vision-Language Models by Deep Reinforcement Learning and Direct Preference Optimization

AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization