Zoom-In to Sort AI-Generated Images Out
By: Yikun Ji , Yan Hong , Bowen Deng and more
Potential Business Impact:
Finds fake pictures by looking closer.
The rapid growth of AI-generated imagery has blurred the boundary between real and synthetic content, raising critical concerns for digital integrity. Vision-language models (VLMs) offer interpretability through explanations but often fail to detect subtle artifacts in high-quality synthetic images. We propose ZoomIn, a two-stage forensic framework that improves both accuracy and interpretability. Mimicking human visual inspection, ZoomIn first scans an image to locate suspicious regions and then performs a focused analysis on these zoomed-in areas to deliver a grounded verdict. To support training, we introduce MagniFake, a dataset of 20,000 real and high-quality synthetic images annotated with bounding boxes and forensic explanations, generated through an automated VLM-based pipeline. Our method achieves 96.39% accuracy with robust generalization, while providing human-understandable explanations grounded in visual evidence.
Similar Papers
Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation
CV and Pattern Recognition
Finds fake pictures and explains why.
SemVink: Advancing VLMs' Semantic Understanding of Optical Illusions via Visual Global Thinking
Computation and Language
Makes computers see hidden things in pictures.
TruthLens:A Training-Free Paradigm for DeepFake Detection
CV and Pattern Recognition
Finds fake pictures and explains why.