EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
By: Ashish Seth , Utkarsh Tyagi , Ramaneswaran Selvakumar and more
Potential Business Impact:
Finds when AI "sees" wrong in videos.
Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in complex multimodal tasks. While MLLMs excel at visual perception and reasoning in third-person and egocentric videos, they are prone to hallucinations, generating coherent yet inaccurate responses. We present EgoIllusion, a first benchmark to evaluate MLLM hallucinations in egocentric videos. EgoIllusion comprises 1,400 videos paired with 8,000 human-annotated open and closed-ended questions designed to trigger hallucinations in both visual and auditory cues in egocentric videos. Evaluations across ten MLLMs reveal significant challenges, including powerful models like GPT-4o and Gemini, achieving only 59% accuracy. EgoIllusion lays the foundation in developing robust benchmarks to evaluate the effectiveness of MLLMs and spurs the development of better egocentric MLLMs with reduced hallucination rates. Our benchmark will be open-sourced for reproducibility.
Similar Papers
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
Artificial Intelligence
Fixes AI mistakes in videos it watches.
HalluLens: LLM Hallucination Benchmark
Computation and Language
Stops AI from making up fake answers.
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding
CV and Pattern Recognition
Fixes AI videos that make up wrong stories.