A Survey of Multimodal Hallucination Evaluation and Detection
By: Zhiyuan Chen , Yuecong Min , Jie Zhang and more
Potential Business Impact:
Fixes AI that makes up fake things.
Multi-modal Large Language Models (MLLMs) have emerged as a powerful paradigm for integrating visual and textual information, supporting a wide range of multi-modal tasks. However, these models often suffer from hallucination, producing content that appears plausible but contradicts the input content or established world knowledge. This survey offers an in-depth review of hallucination evaluation benchmarks and detection methods across Image-to-Text (I2T) and Text-to-image (T2I) generation tasks. Specifically, we first propose a taxonomy of hallucination based on faithfulness and factuality, incorporating the common types of hallucinations observed in practice. Then we provide an overview of existing hallucination evaluation benchmarks for both T2I and I2T tasks, highlighting their construction process, evaluation objectives, and employed metrics. Furthermore, we summarize recent advances in hallucination detection methods, which aims to identify hallucinated content at the instance level and serve as a practical complement of benchmark-based evaluation. Finally, we highlight key limitations in current benchmarks and detection methods, and outline potential directions for future research.
Similar Papers
Trustworthy Medical Imaging with Large Language Models: A Study of Hallucinations Across Modalities
Image and Video Processing
Fixes AI mistakes in medical pictures.
Hallucination Detection and Evaluation of Large Language Model
Computation and Language
Finds fake answers from smart computer programs.
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection
Computation and Language
Makes AI less likely to make up facts.