DIVER: Dynamic Iterative Visual Evidence Reasoning for Multimodal Fake News Detection
By: Weilin Zhou , Zonghao Ying , Chunlei Meng and more
Multimodal fake news detection is crucial for mitigating adversarial misinformation. Existing methods, relying on static fusion or LLMs, face computational redundancy and hallucination risks due to weak visual foundations. To address this, we propose DIVER (Dynamic Iterative Visual Evidence Reasoning), a framework grounded in a progressive, evidence-driven reasoning paradigm. DIVER first establishes a strong text-based baseline through language analysis, leveraging intra-modal consistency to filter unreliable or hallucinated claims. Only when textual evidence is insufficient does the framework introduce visual information, where inter-modal alignment verification adaptively determines whether deeper visual inspection is necessary. For samples exhibiting significant cross-modal semantic discrepancies, DIVER selectively invokes fine-grained visual tools (e.g., OCR and dense captioning) to extract task-relevant evidence, which is iteratively aggregated via uncertainty-aware fusion to refine multimodal reasoning. Experiments on Weibo, Weibo21, and GossipCop demonstrate that DIVER outperforms state-of-the-art baselines by an average of 2.72\%, while optimizing inference efficiency with a reduced latency of 4.12 s.
Similar Papers
DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval
Information Retrieval
Helps computers answer tricky questions by thinking.
DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval
Information Retrieval
Helps computers answer tricky questions by thinking.
DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval
Information Retrieval
Helps computers answer tricky questions by thinking.