Score: 0

Perception, Understanding and Reasoning, A Multimodal Benchmark for Video Fake News Detection

Published: October 28, 2025 | arXiv ID: 2510.24816v1

By: Cui Yakun , Fushuo Huo , Weijie Shi and more

Potential Business Impact:

Helps computers spot fake videos better.

Business Areas:
Image Recognition Data and Analytics, Software

The advent of multi-modal large language models (MLLMs) has greatly advanced research into applications for Video fake news detection (VFND) tasks. Traditional video-based FND benchmarks typically focus on the accuracy of the final decision, often failing to provide fine-grained assessments for the entire detection process, making the detection process a black box. Therefore, we introduce the MVFNDB (Multi-modal Video Fake News Detection Benchmark) based on the empirical analysis, which provides foundation for tasks definition. The benchmark comprises 10 tasks and is meticulously crafted to probe MLLMs' perception, understanding, and reasoning capacities during detection, featuring 9730 human-annotated video-related questions based on a carefully constructed taxonomy ability of VFND. To validate the impact of combining multiple features on the final results, we design a novel framework named MVFND-CoT, which incorporates both creator-added content and original shooting footage reasoning. Building upon the benchmark, we conduct an in-depth analysis of the deeper factors influencing accuracy, including video processing strategies and the alignment between video features and model capabilities. We believe this benchmark will lay a solid foundation for future evaluations and advancements of MLLMs in the domain of video fake news detection.

Country of Origin
🇭🇰 Hong Kong

Page Count
20 pages

Category
Computer Science:
CV and Pattern Recognition