Can ChatGPT Perform Image Splicing Detection? A Preliminary Study
By: Souradip Nath
Potential Business Impact:
Finds fake pictures by looking at clues.
Multimodal Large Language Models (MLLMs) like GPT-4V are capable of reasoning across text and image modalities, showing promise in a variety of complex vision-language tasks. In this preliminary study, we investigate the out-of-the-box capabilities of GPT-4V in the domain of image forensics, specifically, in detecting image splicing manipulations. Without any task-specific fine-tuning, we evaluate GPT-4V using three prompting strategies: Zero-Shot (ZS), Few-Shot (FS), and Chain-of-Thought (CoT), applied over a curated subset of the CASIA v2.0 splicing dataset. Our results show that GPT-4V achieves competitive detection performance in zero-shot settings (more than 85% accuracy), with CoT prompting yielding the most balanced trade-off across authentic and spliced images. Qualitative analysis further reveals that the model not only detects low-level visual artifacts but also draws upon real-world contextual knowledge such as object scale, semantic consistency, and architectural facts, to identify implausible composites. While GPT-4V lags behind specialized state-of-the-art splicing detection models, its generalizability, interpretability, and encyclopedic reasoning highlight its potential as a flexible tool in image forensics.
Similar Papers
Can GPT tell us why these images are synthesized? Empowering Multimodal Large Language Models for Forensics
CV and Pattern Recognition
Finds fake images and shows how they were made.
GPT-5 Model Corrected GPT-4V's Chart Reading Errors, Not Prompting
Human-Computer Interaction
New AI understands charts better than older AI.
Prompt to Protection: A Comparative Study of Multimodal LLMs in Construction Hazard Recognition
CV and Pattern Recognition
Helps AI spot dangers on building sites.