Score: 1

Vision-Language Introspection: Mitigating Overconfident Hallucinations in MLLMs via Interpretable Bi-Causal Steering

Published: January 8, 2026 | arXiv ID: 2601.05159v1

By: Shuliang Liu , Songbo Yang , Dong Fang and more

Potential Business Impact:

Fixes AI mistakes when seeing and talking.

Business Areas:

Visual Search Internet Services

Object hallucination critically undermines the reliability of Multimodal Large Language Models, often stemming from a fundamental failure in cognitive introspection, where models blindly trust linguistic priors over specific visual evidence. Existing mitigations remain limited: contrastive decoding approaches operate superficially without rectifying internal semantic misalignments, while current latent steering methods rely on static vectors that lack instance-specific precision. We introduce Vision-Language Introspection (VLI), a training-free inference framework that simulates a metacognitive self-correction process. VLI first performs Attributive Introspection to diagnose hallucination risks via probabilistic conflict detection and localize the causal visual anchors. It then employs Interpretable Bi-Causal Steering to actively modulate the inference process, dynamically isolating visual evidence from background noise while neutralizing blind confidence through adaptive calibration. VLI achieves state-of-the-art performance on advanced models, reducing object hallucination rates by 12.67% on MMHal-Bench and improving accuracy by 5.8% on POPE.

V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time Intervention

CV and Pattern Recognition

Stops AI from making up fake pictures.

3 Dec 2025 1

92%

Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models

CV and Pattern Recognition

Fixes AI mistakes when describing pictures.

8 Dec 2025 0

92%

Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models

CV and Pattern Recognition

Makes AI see better, not just guess words.

5 Dec 2025 0

View PDF Login to Bookmark

Page Count

24 pages

Vision-Language Introspection: Mitigating Overconfident Hallucinations in MLLMs via Interpretable Bi-Causal Steering

Fixes AI mistakes when seeing and talking.

Technical Abstract

V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time Intervention

Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models

Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models