Distill Video Datasets into Images
By: Zhenghao Zhao , Haoxuan Wang , Kai Wang and more
Potential Business Impact:
Makes AI learn from short video clips faster.
Dataset distillation aims to synthesize compact yet informative datasets that allow models trained on them to achieve performance comparable to training on the full dataset. While this approach has shown promising results for image data, extending dataset distillation methods to video data has proven challenging and often leads to suboptimal performance. In this work, we first identify the core challenge in video set distillation as the substantial increase in learnable parameters introduced by the temporal dimension of video, which complicates optimization and hinders convergence. To address this issue, we observe that a single frame is often sufficient to capture the discriminative semantics of a video. Leveraging this insight, we propose Single-Frame Video set Distillation (SFVD), a framework that distills videos into highly informative frames for each class. Using differentiable interpolation, these frames are transformed into video sequences and matched with the original dataset, while updates are restricted to the frames themselves for improved optimization efficiency. To further incorporate temporal information, the distilled frames are combined with sampled real videos from real videos during the matching process through a channel reshaping layer. Extensive experiments on multiple benchmarks demonstrate that SFVD substantially outperforms prior methods, achieving improvements of up to 5.3% on MiniUCF, thereby offering a more effective solution.
Similar Papers
Dataset Distillation for Pre-Trained Self-Supervised Vision Models
CV and Pattern Recognition
Creates small, smart picture sets for AI.
Video Dataset Condensation with Diffusion Models
CV and Pattern Recognition
Makes huge video collections much smaller.
Dynamic-Aware Video Distillation: Optimizing Temporal Resolution Based on Video Semantics
CV and Pattern Recognition
Makes video learning faster by removing extra frames.