A New Dataset and Benchmark for Grounding Multimodal Misinformation
By: Bingjian Yang , Danni Xu , Kaipeng Niu and more
Potential Business Impact:
Finds fake videos by checking words, sounds, and pictures.
The proliferation of online misinformation videos poses serious societal risks. Current datasets and detection methods primarily target binary classification or single-modality localization based on post-processed data, lacking the interpretability needed to counter persuasive misinformation. In this paper, we introduce the task of Grounding Multimodal Misinformation (GroundMM), which verifies multimodal content and localizes misleading segments across modalities. We present the first real-world dataset for this task, GroundLie360, featuring a taxonomy of misinformation types, fine-grained annotations across text, speech, and visuals, and validation with Snopes evidence and annotator reasoning. We also propose a VLM-based, QA-driven baseline, FakeMark, using single- and cross-modal cues for effective detection and grounding. Our experiments highlight the challenges of this task and lay a foundation for explainable multimodal misinformation detection.
Similar Papers
MMD-Thinker: Adaptive Multi-Dimensional Thinking for Multimodal Misinformation Detection
CV and Pattern Recognition
Finds fake online pictures and stories better.
Beyond Artificial Misalignment: Detecting and Grounding Semantic-Coordinated Multimodal Manipulations
CV and Pattern Recognition
Finds fake pictures with matching fake stories.
Multimodal Fact Checking with Unified Visual, Textual, and Contextual Representations
Computation and Language
Spots fake news using both words and pictures.