Score: 1

CoFi-Dec: Hallucination-Resistant Decoding via Coarse-to-Fine Generative Feedback in Large Vision-Language Models

Published: December 29, 2025 | arXiv ID: 2512.23453v1

By: Zongsheng Cao , Yangfan He , Anran Liu and more

Potential Business Impact:

Makes AI describe pictures without making things up.

Business Areas:

Computer Vision Hardware, Software

Large Vision-Language Models (LVLMs) have achieved impressive progress in multi-modal understanding and generation. However, they still tend to produce hallucinated content that is inconsistent with the visual input, which limits their reliability in real-world applications. We propose \textbf{CoFi-Dec}, a training-free decoding framework that mitigates hallucinations by integrating generative self-feedback with coarse-to-fine visual conditioning. Inspired by the human visual process from global scene perception to detailed inspection, CoFi-Dec first generates two intermediate textual responses conditioned on coarse- and fine-grained views of the original image. These responses are then transformed into synthetic images using a text-to-image model, forming multi-level visual hypotheses that enrich grounding cues. To unify the predictions from these multiple visual conditions, we introduce a Wasserstein-based fusion mechanism that aligns their predictive distributions into a geometrically consistent decoding trajectory. This principled fusion reconciles high-level semantic consistency with fine-grained visual grounding, leading to more robust and faithful outputs. Extensive experiments on six hallucination-focused benchmarks show that CoFi-Dec substantially reduces both entity-level and semantic-level hallucinations, outperforming existing decoding strategies. The framework is model-agnostic, requires no additional training, and can be seamlessly applied to a wide range of LVLMs. The implementation is available at https://github.com/AI-Researcher-Team/CoFi-Dec.

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

CV and Pattern Recognition

Fixes AI "talking nonsense" about pictures.

10 Feb 2025 2

90%

Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding

CV and Pattern Recognition

Stops AI from making up fake objects in pictures.

3 Feb 2025 0

89%

Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models -

CV and Pattern Recognition

Stops AI from making up fake answers about pictures.

16 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

15 pages

CoFi-Dec: Hallucination-Resistant Decoding via Coarse-to-Fine Generative Feedback in Large Vision-Language Models

Makes AI describe pictures without making things up.

Technical Abstract

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding

Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models -