Label-Consistent Dataset Distillation with Detector-Guided Refinement
By: Yawen Zou , Guang Li , Zi Wang and more
Potential Business Impact:
Makes fake pictures for computers to learn better.
Dataset distillation (DD) aims to generate a compact yet informative dataset that achieves performance comparable to the original dataset, thereby reducing demands on storage and computational resources. Although diffusion models have made significant progress in dataset distillation, the generated surrogate datasets often contain samples with label inconsistencies or insufficient structural detail, leading to suboptimal downstream performance. To address these issues, we propose a detector-guided dataset distillation framework that explicitly leverages a pre-trained detector to identify and refine anomalous synthetic samples, thereby ensuring label consistency and improving image quality. Specifically, a detector model trained on the original dataset is employed to identify anomalous images exhibiting label mismatches or low classification confidence. For each defective image, multiple candidates are generated using a pre-trained diffusion model conditioned on the corresponding image prototype and label. The optimal candidate is then selected by jointly considering the detector's confidence score and dissimilarity to existing qualified synthetic samples, thereby ensuring both label accuracy and intra-class diversity. Experimental results demonstrate that our method can synthesize high-quality representative images with richer details, achieving state-of-the-art performance on the validation set.
Similar Papers
Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling
CV and Pattern Recognition
Makes AI learn better with less data.
Dataset Distillation for Pre-Trained Self-Supervised Vision Models
CV and Pattern Recognition
Creates small, smart picture sets for AI.
Leveraging Multi-Modal Information to Enhance Dataset Distillation
CV and Pattern Recognition
Makes fake pictures teach computers better.