Score: 2

AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization

Published: April 2, 2025 | arXiv ID: 2504.01735v1

By: Chaohu Liu , Tianyi Gui , Yu Liu and more

BigTech Affiliations: Alibaba

Potential Business Impact:

Protects AI from tricks, keeps answers correct.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Vision-Language Models (LVLMs), such as GPT-4o and LLaVA, have recently witnessed remarkable advancements and are increasingly being deployed in real-world applications. However, inheriting the sensitivity of visual neural networks, LVLMs remain vulnerable to adversarial attacks, which can result in erroneous or malicious outputs. While existing efforts utilize adversarial fine-tuning to enhance robustness, they often suffer from performance degradation on clean inputs. In this paper, we proposes AdPO, a novel adversarial defense strategy for LVLMs based on preference optimization. For the first time, we reframe adversarial training as a preference optimization problem, aiming to enhance the model's preference for generating normal outputs on clean inputs while rejecting the potential misleading outputs for adversarial examples. Notably, AdPO achieves this by solely modifying the image encoder, e.g., CLIP ViT, resulting in superior clean and adversarial performance in a variety of downsream tasks. Considering that training involves large language models (LLMs), the computational cost increases significantly. We validate that training on smaller LVLMs and subsequently transferring to larger models can achieve competitive performance while maintaining efficiency comparable to baseline methods. Our comprehensive experiments confirm the effectiveness of the proposed AdPO, which provides a novel perspective for future adversarial defense research.

Aligning Large Vision-Language Models by Deep Reinforcement Learning and Direct Preference Optimization

Machine Learning (CS)

Teaches AI to understand pictures and words better.

8 Sep 2025 0

90%

AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization

CV and Pattern Recognition

Teaches AI to see and understand pictures better.

22 Apr 2025 1

90%

Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization

CV and Pattern Recognition

Teaches AI to learn from its video mistakes.

16 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

14 pages

AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization

Protects AI from tricks, keeps answers correct.

Technical Abstract

Aligning Large Vision-Language Models by Deep Reinforcement Learning and Direct Preference Optimization

AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization

Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization