Score: 1

Modality Bias in LVLMs: Analyzing and Mitigating Object Hallucination via Attention Lens

Published: August 4, 2025 | arXiv ID: 2508.02419v1

By: Haohan Zheng, Zhenguo Zhang

Potential Business Impact:

Fixes AI's tendency to make up objects.

Large vision-language models (LVLMs) have demonstrated remarkable multimodal comprehension and reasoning capabilities, but they still suffer from severe object hallucination. Previous studies primarily attribute the flaw to linguistic prior caused by the scale mismatch between visual encoders and large language models (LLMs) in LVLMs. Specifically, as current LVLMs are built upon LLMs, they tend to over-rely on textual prompts and internal knowledge of LLMs, generating descriptions inconsistent with visual cues. However, through an in-depth investigation of the hallucinated mechanisms, we empirically reveal a previously overlooked phenomenon: LVLMs may ignore not only visual information but also textual modality during hallucination, a behavior termed as modality bias, which indicates that LVLMs struggle to simultaneously attend to both visual and textual modalities, leading to fragmented understanding of user-provided instructions. Based on this observation, we propose a simple yet effective training-free method to mitigate object hallucination. Concretely, we intervene and adjust the attention weights of textual and visual tokens, balancing cross-modal compatibility for better alignment with user intentions. Furthermore, we adopt a contrastive decoding strategy to reduce the LVLM's overreliance on its parametric knowledge, synergistically enhancing our attention manipulation. Extensive experiments confirm the widespread presence of modality bias in LVLMs. Notably, our method effectively mitigates hallucination across multiple open-source LVLMs and benchmarks, highlighting its generalizability and efficacy.

A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models

CV and Pattern Recognition

Fixes AI mistakes when it sees and talks.

4 May 2025 0

91%

Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings

CV and Pattern Recognition

Makes AI understand pictures and words better.

7 Nov 2025 0

91%

Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models

CV and Pattern Recognition

Fixes AI mistakes when describing pictures.

17 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

15 pages

Modality Bias in LVLMs: Analyzing and Mitigating Object Hallucination via Attention Lens

Fixes AI's tendency to make up objects.

Technical Abstract

A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models

Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings

Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models