Score: 0

MM-FusionNet: Context-Aware Dynamic Fusion for Multi-modal Fake News Detection with Large Vision-Language Models

Published: August 5, 2025 | arXiv ID: 2508.05684v1

By: Junhao He , Tianyu Liu , Jingyuan Zhao and more

Potential Business Impact:

Finds fake news using text and pictures.

The proliferation of multi-modal fake news on social media poses a significant threat to public trust and social stability. Traditional detection methods, primarily text-based, often fall short due to the deceptive interplay between misleading text and images. While Large Vision-Language Models (LVLMs) offer promising avenues for multi-modal understanding, effectively fusing diverse modal information, especially when their importance is imbalanced or contradictory, remains a critical challenge. This paper introduces MM-FusionNet, an innovative framework leveraging LVLMs for robust multi-modal fake news detection. Our core contribution is the Context-Aware Dynamic Fusion Module (CADFM), which employs bi-directional cross-modal attention and a novel dynamic modal gating network. This mechanism adaptively learns and assigns importance weights to textual and visual features based on their contextual relevance, enabling intelligent prioritization of information. Evaluated on the large-scale Multi-modal Fake News Dataset (LMFND) comprising 80,000 samples, MM-FusionNet achieves a state-of-the-art F1-score of 0.938, surpassing existing multi-modal baselines by approximately 0.5% and significantly outperforming single-modal approaches. Further analysis demonstrates the model's dynamic weighting capabilities, its robustness to modality perturbations, and performance remarkably close to human-level, underscoring its practical efficacy and interpretability for real-world fake news detection.

Page Count
10 pages

Category
Computer Science:
Cryptography and Security