Context-guided Responsible Data Augmentation with Diffusion Models
By: Khawar Islam, Naveed Akhtar
Potential Business Impact:
Makes AI better at recognizing pictures by adding fake ones.
Generative diffusion models offer a natural choice for data augmentation when training complex vision models. However, ensuring reliability of their generative content as augmentation samples remains an open challenge. Despite a number of techniques utilizing generative images to strengthen model training, it remains unclear how to utilize the combination of natural and generative images as a rich supervisory signal for effective model induction. In this regard, we propose a text-to-image (T2I) data augmentation method, named DiffCoRe-Mix, that computes a set of generative counterparts for a training sample with an explicitly constrained diffusion model that leverages sample-based context and negative prompting for a reliable augmentation sample generation. To preserve key semantic axes, we also filter out undesired generative samples in our augmentation process. To that end, we propose a hard-cosine filtration in the embedding space of CLIP. Our approach systematically mixes the natural and generative images at pixel and patch levels. We extensively evaluate our technique on ImageNet-1K,Tiny ImageNet-200, CIFAR-100, Flowers102, CUB-Birds, Stanford Cars, and Caltech datasets, demonstrating a notable increase in performance across the board, achieving up to $\sim 3\%$ absolute gain for top-1 accuracy over the state-of-the-art methods, while showing comparable computational overhead. Our code is publicly available at https://github.com/khawar-islam/DiffCoRe-Mix
Similar Papers
Diverse Text-to-Image Generation via Contrastive Noise Optimization
Graphics
Makes AI pictures more different and interesting.
Preserving Product Fidelity in Large Scale Image Recontextualization with Diffusion Models
CV and Pattern Recognition
Makes product pictures look real in new places.
SGD-Mix: Enhancing Domain-Specific Image Classification with Label-Preserving Data Augmentation
CV and Pattern Recognition
Makes computer pictures more real for learning.