Score: 2

Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models

Published: March 24, 2025 | arXiv ID: 2503.18556v1

By: Bin Li , Dehong Gao , Yeyuan Wang and more

BigTech Affiliations: Alibaba

Potential Business Impact:

Makes AI describe pictures without making things up.

Business Areas:

Visual Search Internet Services

Despite the significant success of Large Vision-Language models(LVLMs), these models still suffer hallucinations when describing images, generating answers that include non-existent objects. It is reported that these models tend to over-focus on certain irrelevant image tokens that do not contain critical information for answering the question and distort the output. To address this, we propose an Instruction-Aligned Visual Attention(IAVA) approach, which identifies irrelevant tokens by comparing changes in attention weights under two different instructions. By applying contrastive decoding, we dynamically adjust the logits generated from original image tokens and irrelevant image tokens, reducing the model's over-attention to irrelevant information. The experimental results demonstrate that IAVA consistently outperforms existing decoding techniques on benchmarks such as MME, POPE, and TextVQA in mitigating object hallucinations. Our IAVA approach is available online at https://github.com/Lee-lab558/IAVA.

Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation

CV and Pattern Recognition

Stops AI from making up fake details about pictures.

11 Mar 2025 1

91%

A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models

CV and Pattern Recognition

Fixes AI mistakes when it sees and talks.

4 May 2025 0

91%

VEGAS: Mitigating Hallucinations in Large Vision-Language Models via Vision-Encoder Attention Guided Adaptive Steering

CV and Pattern Recognition

Makes AI see pictures better, less mistakes.

12 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

7 pages

Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models

Makes AI describe pictures without making things up.

Technical Abstract

Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation

A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models

VEGAS: Mitigating Hallucinations in Large Vision-Language Models via Vision-Encoder Attention Guided Adaptive Steering