Score: 1

A Cross-Modal Rumor Detection Scheme via Contrastive Learning by Exploring Text and Image internal Correlations

Published: August 15, 2025 | arXiv ID: 2508.11141v1

By: Bin Ma , Yifei Zhang , Yongjin Xian and more

Potential Business Impact:

Finds fake news by checking text and pictures.

Existing rumor detection methods often neglect the content within images as well as the inherent relationships between contexts and images across different visual scales, thereby resulting in the loss of critical information pertinent to rumor identification. To address these issues, this paper presents a novel cross-modal rumor detection scheme based on contrastive learning, namely the Multi-scale Image and Context Correlation exploration algorithm (MICC). Specifically, we design an SCLIP encoder to generate unified semantic embeddings for text and multi-scale image patches through contrastive pretraining, enabling their relevance to be measured via dot-product similarity. Building upon this, a Cross-Modal Multi-Scale Alignment module is introduced to identify image regions most relevant to the textual semantics, guided by mutual information maximization and the information bottleneck principle, through a Top-K selection strategy based on a cross-modal relevance matrix constructed between the text and multi-scale image patches. Moreover, a scale-aware fusion network is designed to integrate the highly correlated multi-scale image features with global text features by assigning adaptive weights to image regions based on their semantic importance and cross-modal relevance. The proposed methodology has been extensively evaluated on two real-world datasets. The experimental results demonstrate that it achieves a substantial performance improvement over existing state-of-the-art approaches in rumor detection, highlighting its effectiveness and potential for practical applications.

MMSD3.0: A Multi-Image Benchmark for Real-World Multimodal Sarcasm Detection

CV and Pattern Recognition

Helps computers spot sarcasm in multiple pictures.

27 Oct 2025 1

88%

Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents

CV and Pattern Recognition

Lets computers understand web pages with text and pictures.

21 Oct 2025 1

88%

UMCL: Unimodal-generated Multimodal Contrastive Learning for Cross-compression-rate Deepfake Detection

CV and Pattern Recognition

Finds fake videos even when they are squeezed.

24 Nov 2025 0

View PDF Login to Bookmark

Page Count

25 pages

A Cross-Modal Rumor Detection Scheme via Contrastive Learning by Exploring Text and Image internal Correlations

Finds fake news by checking text and pictures.

Technical Abstract

MMSD3.0: A Multi-Image Benchmark for Real-World Multimodal Sarcasm Detection

Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents

UMCL: Unimodal-generated Multimodal Contrastive Learning for Cross-compression-rate Deepfake Detection