Score: 1

MMSD3.0: A Multi-Image Benchmark for Real-World Multimodal Sarcasm Detection

Published: October 27, 2025 | arXiv ID: 2510.23299v1

By: Haochen Zhao , Yuyao Kong , Yongxiu Xu and more

Potential Business Impact:

Helps computers spot sarcasm in multiple pictures.

Business Areas:

Visual Search Internet Services

Despite progress in multimodal sarcasm detection, existing datasets and methods predominantly focus on single-image scenarios, overlooking potential semantic and affective relations across multiple images. This leaves a gap in modeling cases where sarcasm is triggered by multi-image cues in real-world settings. To bridge this gap, we introduce MMSD3.0, a new benchmark composed entirely of multi-image samples curated from tweets and Amazon reviews. We further propose the Cross-Image Reasoning Model (CIRM), which performs targeted cross-image sequence modeling to capture latent inter-image connections. In addition, we introduce a relevance-guided, fine-grained cross-modal fusion mechanism based on text-image correspondence to reduce information loss during integration. We establish a comprehensive suite of strong and representative baselines and conduct extensive experiments, showing that MMSD3.0 is an effective and reliable benchmark that better reflects real-world conditions. Moreover, CIRM demonstrates state-of-the-art performance across MMSD, MMSD2.0 and MMSD3.0, validating its effectiveness in both single-image and multi-image scenarios.

"Humor, Art, or Misinformation?": A Multimodal Dataset for Intent-Aware Synthetic Image Detection

CV and Pattern Recognition

Finds if AI pictures are funny, art, or lies.

28 Aug 2025 1

88%

"Humor, Art, or Misinformation?": A Multimodal Dataset for Intent-Aware Synthetic Image Detection

CV and Pattern Recognition

Helps tell if AI pictures are jokes or lies.

28 Aug 2025 1

88%

A Cross-Modal Rumor Detection Scheme via Contrastive Learning by Exploring Text and Image internal Correlations

CV and Pattern Recognition

Finds fake news by checking text and pictures.

15 Aug 2025 1

View PDF Login to Bookmark

Page Count

13 pages

MMSD3.0: A Multi-Image Benchmark for Real-World Multimodal Sarcasm Detection

Helps computers spot sarcasm in multiple pictures.

Technical Abstract

"Humor, Art, or Misinformation?": A Multimodal Dataset for Intent-Aware Synthetic Image Detection

"Humor, Art, or Misinformation?": A Multimodal Dataset for Intent-Aware Synthetic Image Detection

A Cross-Modal Rumor Detection Scheme via Contrastive Learning by Exploring Text and Image internal Correlations