CINEMAE: Leveraging Frozen Masked Autoencoders for Cross-Generator AI Image Detection
By: Minsuk Jang , Hyeonseo Jeong , Minseok Son and more
Potential Business Impact:
Finds fake pictures made by computers.
While context-based detectors have achieved strong generalization for AI-generated text by measuring distributional inconsistencies, image-based detectors still struggle with overfitting to generator-specific artifacts. We introduce CINEMAE, a novel paradigm for AIGC image detection that adapts the core principles of text detection methods to the visual domain. Our key insight is that Masked AutoEncoder (MAE), trained to reconstruct masked patches conditioned on visible context, naturally encodes semantic consistency expectations. We formalize this reconstruction process probabilistically, computing conditional Negative Log-Likelihood (NLL, p(masked | visible)) to quantify local semantic anomalies. By aggregating these patch-level statistics with global MAE features through learned fusion, CINEMAE achieves strong cross-generator generalization. Trained exclusively on Stable Diffusion v1.4, our method achieves over 95% accuracy on all eight unseen generators in the GenImage benchmark, substantially outperforming state-of-the-art detectors. This demonstrates that context-conditional reconstruction uncertainty provides a robust, transferable signal for AIGC detection.
Similar Papers
Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos
CV and Pattern Recognition
Teaches computers to understand sound, sight, and words.
CrossVideoMAE: Self-Supervised Image-Video Representation Learning with Masked Autoencoders
CV and Pattern Recognition
Teaches computers to understand videos better.
CoMA: Complementary Masking and Hierarchical Dynamic Multi-Window Self-Attention in a Unified Pre-training Framework
CV and Pattern Recognition
Teaches computers to see faster and better.