Score: 1

CINEMAE: Leveraging Frozen Masked Autoencoders for Cross-Generator AI Image Detection

Published: November 9, 2025 | arXiv ID: 2511.06325v1

By: Minsuk Jang , Hyeonseo Jeong , Minseok Son and more

Potential Business Impact:

Finds fake pictures made by computers.

Business Areas:

Image Recognition Data and Analytics, Software

While context-based detectors have achieved strong generalization for AI-generated text by measuring distributional inconsistencies, image-based detectors still struggle with overfitting to generator-specific artifacts. We introduce CINEMAE, a novel paradigm for AIGC image detection that adapts the core principles of text detection methods to the visual domain. Our key insight is that Masked AutoEncoder (MAE), trained to reconstruct masked patches conditioned on visible context, naturally encodes semantic consistency expectations. We formalize this reconstruction process probabilistically, computing conditional Negative Log-Likelihood (NLL, p(masked | visible)) to quantify local semantic anomalies. By aggregating these patch-level statistics with global MAE features through learned fusion, CINEMAE achieves strong cross-generator generalization. Trained exclusively on Stable Diffusion v1.4, our method achieves over 95% accuracy on all eight unseen generators in the GenImage benchmark, substantially outperforming state-of-the-art detectors. This demonstrates that context-conditional reconstruction uncertainty provides a robust, transferable signal for AIGC detection.

Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos

CV and Pattern Recognition

Teaches computers to understand sound, sight, and words.

16 Jul 2025 1

89%

CrossVideoMAE: Self-Supervised Image-Video Representation Learning with Masked Autoencoders

CV and Pattern Recognition

Teaches computers to understand videos better.

8 Feb 2025 1

89%

CoMA: Complementary Masking and Hierarchical Dynamic Multi-Window Self-Attention in a Unified Pre-training Framework

CV and Pattern Recognition

Teaches computers to see faster and better.

8 Nov 2025 1

View PDF Login to Bookmark

Page Count

9 pages

CINEMAE: Leveraging Frozen Masked Autoencoders for Cross-Generator AI Image Detection

Finds fake pictures made by computers.

Technical Abstract

Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos

CrossVideoMAE: Self-Supervised Image-Video Representation Learning with Masked Autoencoders

CoMA: Complementary Masking and Hierarchical Dynamic Multi-Window Self-Attention in a Unified Pre-training Framework