SKDU at De-Factify 4.0: Vision Transformer with Data Augmentation for AI-Generated Image Detection
By: Shrikant Malviya, Neelanjan Bhowmik, Stamos Katsigiannis
Potential Business Impact:
Finds fake pictures made by computers.
The aim of this work is to explore the potential of pre-trained vision-language models, e.g. Vision Transformers (ViT), enhanced with advanced data augmentation strategies for the detection of AI-generated images. Our approach leverages a fine-tuned ViT model trained on the Defactify-4.0 dataset, which includes images generated by state-of-the-art models such as Stable Diffusion 2.1, Stable Diffusion XL, Stable Diffusion 3, DALL-E 3, and MidJourney. We employ perturbation techniques like flipping, rotation, Gaussian noise injection, and JPEG compression during training to improve model robustness and generalisation. The experimental results demonstrate that our ViT-based pipeline achieves state-of-the-art performance, significantly outperforming competing methods on both validation and test datasets.
Similar Papers
Advance Fake Video Detection via Vision Transformers
CV and Pattern Recognition
Finds fake videos made by computers.
Edge-Enhanced Vision Transformer Framework for Accurate AI-Generated Image Detection
CV and Pattern Recognition
Finds fake pictures made by computers.
DenSe-AdViT: A novel Vision Transformer for Dense SAR Object Detection
CV and Pattern Recognition
Finds small, crowded objects in satellite pictures.