Score: 0

Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation

Published: June 14, 2025 | arXiv ID: 2506.12609v2

By: Lexiang Tang , Xianwei Zhuang , Bang Yang and more

Potential Business Impact:

Fixes AI's mistakes when it describes pictures.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large vision-language models (LVLMs) have demonstrated impressive capabilities across diverse multimodal tasks, yet they remain highly susceptible to visual hallucinations (VH), often producing confident but inaccurate descriptions of visual content. Building on the insight that not all tokens and attention heads contribute equally to VH mitigation, we introduce VisFlow, a lightweight and training-free framework that alleviates hallucinations by directly modulating attention patterns during inference. To address two primary challenges of VH, namely insufficient visual attention and the dominance of language priors, we identify three problematic attention behaviors in LVLMs: (1) disproportionate allocation of attention to uninformative or trailing visual tokens, (2) over-dependence on the previously generated token, and (3) excessive fixation on system prompts that hinders multimodal integration. To overcome these issues, VisFlow introduces a dual-level Attention Intervention, consisting of Token-level Attention Intervention (TAI), which reinforces attention to salient visual regions, and Head-level Attention Intervention (HAI), which suppresses undue focus on system prompts and adjacent text tokens. Together, these interventions strengthen visual alignment while reducing linguistic bias. Extensive experiments across diverse models and benchmarks demonstrate that VisFlow effectively mitigates hallucinations with minimal computational overhead.

Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models

CV and Pattern Recognition

Makes AI describe pictures without making things up.

24 Mar 2025 2

91%

Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation

CV and Pattern Recognition

Stops AI from making up fake details about pictures.

11 Mar 2025 1

90%

Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow

Multiagent Systems

Fixes AI mistakes when talking about pictures.

26 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

14 pages

Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation

Fixes AI's mistakes when it describes pictures.

Technical Abstract

Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models

Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation

Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow