Score: 3

SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding

Published: December 4, 2025 | arXiv ID: 2512.04643v1

By: Chang-Hsun Wu , Kai-Po Chang , Yu-Yang Sheng and more

BigTech Affiliations: NVIDIA

Potential Business Impact:

Fixes videos so AI understands time better.

Business Areas:

Visual Search Internet Services

Video Large Language Models (VideoLLMs) have shown remarkable progress in video understanding. However, these models still struggle to effectively perceive and exploit rich temporal information in videos when responding to user queries. Therefore, they often generate descriptions of events that are temporal inconsistent or causally implausible, causing severe hallucination issues. While most prior studies have focused on spatial hallucinations (e.g. object mismatches), temporal reasoning in video understanding remains relatively underexplored. To address this issue, we propose Self-Diagnostic Contrastive Decoding (SEASON), a training-free method that adaptively enhances temporal and spatial faithfulness for each output token. It achieves this by dynamically diagnosing each token's hallucination tendency and applying adaptive contrastive decoding against its corresponding temporal and spatial negatives. Extensive experiments demonstrate that SEASON outperforms all existing training-free hallucination mitigation approaches on three hallucination examination benchmarks, while further improves VideoLLMs across four general video understanding benchmarks. The code will be released upon acceptance.

Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding

CV and Pattern Recognition

Makes AI doctors give more accurate answers.

1 Dec 2025 0

90%

Self-Augmented Visual Contrastive Decoding

CV and Pattern Recognition

Makes AI less likely to make up answers.

15 Oct 2025 2

90%

ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM

CV and Pattern Recognition

Makes AI less likely to make up answers.

17 Jun 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 🇹🇼 United States, Taiwan, Province of China

Page Count

16 pages

SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding

Fixes videos so AI understands time better.

Technical Abstract

Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding

Self-Augmented Visual Contrastive Decoding

ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM