Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification
By: Aakash Gore, Anoushka Dey, Aryan Mishra
Potential Business Impact:
Teaches small computers to learn like big ones.
Knowledge distillation has emerged as a powerful technique for model compression, enabling the transfer of knowledge from large teacher networks to compact student models. However, traditional knowledge distillation methods treat all teacher predictions equally, regardless of the teacher's confidence in those predictions. This paper proposes an uncertainty-aware dual-student knowledge distillation framework that leverages teacher prediction uncertainty to selectively guide student learning. We introduce a peer-learning mechanism where two heterogeneous student architectures, specifically ResNet-18 and MobileNetV2, learn collaboratively from both the teacher network and each other. Experimental results on ImageNet-100 demonstrate that our approach achieves superior performance compared to baseline knowledge distillation methods, with ResNet-18 achieving 83.84\% top-1 accuracy and MobileNetV2 achieving 81.46\% top-1 accuracy, representing improvements of 2.04\% and 0.92\% respectively over traditional single-student distillation approaches.
Similar Papers
Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models
CV and Pattern Recognition
Teaches small computers to see actions like big ones.
Architectural Insights into Knowledge Distillation for Object Detection: A Comprehensive Review
CV and Pattern Recognition
Makes smart cameras work on small devices.
Dual-Model Weight Selection and Self-Knowledge Distillation for Medical Image Classification
CV and Pattern Recognition
Makes small computers diagnose diseases from scans.