WeCKD: Weakly-supervised Chained Distillation Network for Efficient Multimodal Medical Imaging
By: Md. Abdur Rahman , Mohaimenul Azam Khan Raiaan , Sami Azam and more
Potential Business Impact:
Teaches computers to learn from less data.
Knowledge distillation (KD) has traditionally relied on a static teacher-student framework, where a large, well-trained teacher transfers knowledge to a single student model. However, these approaches often suffer from knowledge degradation, inefficient supervision, and reliance on either a very strong teacher model or large labeled datasets, which limits their effectiveness in real-world, limited-data scenarios. To address these, we present the first-ever Weakly-supervised Chain-based KD network (WeCKD) that redefines knowledge transfer through a structured sequence of interconnected models. Unlike conventional KD, it forms a progressive distillation chain, where each model not only learns from its predecessor but also refines the knowledge before passing it forward. This structured knowledge transfer further enhances feature learning, reduces data dependency, and mitigates the limitations of one-step KD. Each model in the distillation chain is trained on only a fraction of the dataset and demonstrates that effective learning can be achieved with minimal supervision. Extensive evaluations across four otoscopic imaging datasets demonstrate that it not only matches but in many cases surpasses the performance of existing supervised methods. Experimental results on two other datasets further underscore its generalization across diverse medical imaging modalities, including microscopic and magnetic resonance imaging. Furthermore, our evaluations resulted in cumulative accuracy gains of up to +23% over a single backbone trained on the same limited data, which highlights its potential for real-world adoption.
Similar Papers
Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency
CV and Pattern Recognition
Teaches computers to learn from different kinds of pictures.
Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification
CV and Pattern Recognition
Teaches small computers to learn like big ones.
Dual-Model Weight Selection and Self-Knowledge Distillation for Medical Image Classification
CV and Pattern Recognition
Makes small computers diagnose diseases from scans.