M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset
By: Jiahui Geng, Jonathan Tonglet, Iryna Gurevych
Potential Business Impact:
Helps computers check if pictures and words tell the truth.
Existing real-world datasets for multimodal automated fact-checking have multiple limitations: they contain few instances, focus on only one or two languages and tasks, suffer from evidence leakage, or depend on external sets of news articles for sourcing true claims. To address these shortcomings, we introduce M4FC, a new real-world dataset comprising 4,982 images paired with 6,980 claims. The images, verified by professional fact-checkers from 22 organizations, represent diverse cultural and geographic contexts. Each claim is available in one or two out of ten languages. M4FC spans six multimodal fact-checking tasks: visual claim extraction, claimant intent prediction, fake detection, image contextualization, location verification, and verdict prediction. We provide baseline results for all tasks and analyze how combining intermediate tasks influence downstream verdict prediction performance. We make our dataset and code available.
Similar Papers
MMM-Fact: A Multimodal, Multi-Domain Fact-Checking Dataset with Multi-Level Retrieval Difficulty
Social and Information Networks
Helps computers spot fake news with text, images, and video.
XFacta: Contemporary, Real-World Dataset and Evaluation for Multimodal Misinformation Detection with Multimodal LLMs
Computation and Language
Finds fake news shared with pictures and words.
Multilingual, Multimodal Pipeline for Creating Authentic and Structured Fact-Checked Claim Dataset
Computation and Language
Helps stop fake news with pictures and words.