Score: 0

HPM-KD: Hierarchical Progressive Multi-Teacher Framework for Knowledge Distillation and Efficient Model Compression

Published: December 10, 2025 | arXiv ID: 2512.09886v1

By: Gustavo Coelho Haase, Paulo Henrique Dourado da Silva

Knowledge Distillation (KD) has emerged as a promising technique for model compression but faces critical limitations: (1) sensitivity to hyperparameters requiring extensive manual tuning, (2) capacity gap when distilling from very large teachers to small students, (3) suboptimal coordination in multi-teacher scenarios, and (4) inefficient use of computational resources. We present \textbf{HPM-KD}, a framework that integrates six synergistic components: (i) Adaptive Configuration Manager via meta-learning that eliminates manual hyperparameter tuning, (ii) Progressive Distillation Chain with automatically determined intermediate models, (iii) Attention-Weighted Multi-Teacher Ensemble that learns dynamic per-sample weights, (iv) Meta-Learned Temperature Scheduler that adapts temperature throughout training, (v) Parallel Processing Pipeline with intelligent load balancing, and (vi) Shared Optimization Memory for cross-experiment reuse. Experiments on CIFAR-10, CIFAR-100, and tabular datasets demonstrate that HPM-KD: achieves 10x-15x compression while maintaining 85% accuracy retention, eliminates the need for manual tuning, and reduces training time by 30-40% via parallelization. Ablation studies confirm independent contribution of each component (0.10-0.98 pp). HPM-KD is available as part of the open-source DeepBridge library.

An Empirical Study of Knowledge Distillation for Code Understanding Tasks

Software Engineering

Makes smart computer code understand faster.

21 Aug 2025 0

90%

Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework

Computation and Language

Makes big AI models smaller and faster.

6 Jun 2025 1

89%

UHKD: A Unified Framework for Heterogeneous Knowledge Distillation via Frequency-Domain Representations

CV and Pattern Recognition

Makes small AI models learn from big ones better.

28 Oct 2025 0

View PDF Login to Bookmark

HPM-KD: Hierarchical Progressive Multi-Teacher Framework for Knowledge Distillation and Efficient Model Compression

Technical Abstract

An Empirical Study of Knowledge Distillation for Code Understanding Tasks

Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework

UHKD: A Unified Framework for Heterogeneous Knowledge Distillation via Frequency-Domain Representations