Difficulty-guided Sampling: Bridging the Target Gap between Dataset Distillation and Downstream Tasks
By: Mingzhuo Li , Guang Li , Linfeng Ye and more
In this paper, we propose difficulty-guided sampling (DGS) to bridge the target gap between the distillation objective and the downstream task, therefore improving the performance of dataset distillation. Deep neural networks achieve remarkable performance but have time and storage-consuming training processes. Dataset distillation is proposed to generate compact, high-quality distilled datasets, enabling effective model training while maintaining downstream performance. Existing approaches typically focus on features extracted from the original dataset, overlooking task-specific information, which leads to a target gap between the distillation objective and the downstream task. We propose leveraging characteristics that benefit the downstream training into data distillation to bridge this gap. Focusing on the downstream task of image classification, we introduce the concept of difficulty and propose DGS as a plug-in post-stage sampling module. Following the specific target difficulty distribution, the final distilled dataset is sampled from image pools generated by existing methods. We also propose difficulty-aware guidance (DAG) to explore the effect of difficulty in the generation process. Extensive experiments across multiple settings demonstrate the effectiveness of the proposed methods. It also highlights the broader potential of difficulty for diverse downstream tasks.
Similar Papers
Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling
CV and Pattern Recognition
Makes AI learn better with less data.
Dataset Distillation for Pre-Trained Self-Supervised Vision Models
CV and Pattern Recognition
Creates small, smart picture sets for AI.
Enhancing Diffusion-based Dataset Distillation via Adversary-Guided Curriculum Sampling
CV and Pattern Recognition
Makes computer learning data smaller, better.