Progressive Class-level Distillation
By: Jiayan Li , Jun Li , Zhourui Zhang and more
Potential Business Impact:
Teaches small computers to learn from big ones better.
In knowledge distillation (KD), logit distillation (LD) aims to transfer class-level knowledge from a more powerful teacher network to a small student model via accurate teacher-student alignment at the logits level. Since high-confidence object classes usually dominate the distillation process, low-probability classes which also contain discriminating information are downplayed in conventional methods, leading to insufficient knowledge transfer. To address this issue, we propose a simple yet effective LD method termed Progressive Class-level Distillation (PCD). In contrast to existing methods which perform all-class ensemble distillation, our PCD approach performs stage-wise distillation for step-by-step knowledge transfer. More specifically, we perform ranking on teacher-student logits difference for identifying distillation priority from scratch, and subsequently divide the entire LD process into multiple stages. Next, bidirectional stage-wise distillation incorporating fine-to-coarse progressive learning and reverse coarse-to-fine refinement is conducted, allowing comprehensive knowledge transfer via sufficient logits alignment within separate class groups in different distillation stages. Extension experiments on public benchmarking datasets demonstrate the superiority of our method compared to state-of-the-arts for both classification and detection tasks.
Similar Papers
PLD: A Choice-Theoretic List-Wise Knowledge Distillation
Machine Learning (CS)
Teaches smaller computer brains to think like bigger ones.
Swapped Logit Distillation via Bi-level Teacher Alignment
Machine Learning (CS)
Makes small computers learn as well as big ones.
Parameter-Free Logit Distillation via Sorting Mechanism
Signal Processing
Makes small computers learn as well as big ones.