A Comprehensive Dataset for Human vs. AI Generated Image Detection
By: Rajarshi Roy , Nasrin Imanpour , Ashhar Aziz and more
Potential Business Impact:
Helps tell real pictures from fake ones.
Multimodal generative AI systems like Stable Diffusion, DALL-E, and MidJourney have fundamentally changed how synthetic images are created. These tools drive innovation but also enable the spread of misleading content, false information, and manipulated media. As generated images become harder to distinguish from photographs, detecting them has become an urgent priority. To combat this challenge, We release MS COCOAI, a novel dataset for AI generated image detection consisting of 96000 real and synthetic datapoints, built using the MS COCO dataset. To generate synthetic images, we use five generators: Stable Diffusion 3, Stable Diffusion 2.1, SDXL, DALL-E 3, and MidJourney v6. Based on the dataset, we propose two tasks: (1) classifying images as real or generated, and (2) identifying which model produced a given synthetic image. The dataset is available at https://huggingface.co/datasets/Rajarshi-Roy-research/Defactify_Image_Dataset.
Similar Papers
OpenFake: An Open Dataset and Platform Toward Large-Scale Deepfake Detection
CV and Pattern Recognition
Helps stop fake pictures from spreading lies.
Methods and Trends in Detecting AI-Generated Images: A Comprehensive Review
CV and Pattern Recognition
Finds fake pictures made by AI.
Bridging the Gap Between Ideal and Real-world Evaluation: Benchmarking AI-Generated Image Detection in Challenging Scenarios
CV and Pattern Recognition
Tests if AI can spot fake pictures online.