RMT-KD: Random Matrix Theoretic Causal Knowledge Distillation
By: Davide Ettori , Nastaran Darabi , Sureshkumar Senthilkumar and more
Potential Business Impact:
Makes big AI models smaller and faster.
Large deep learning models such as BERT and ResNet achieve state-of-the-art performance but are costly to deploy at the edge due to their size and compute demands. We present RMT-KD, a compression method that leverages Random Matrix Theory (RMT) for knowledge distillation to iteratively reduce network size. Instead of pruning or heuristic rank selection, RMT-KD preserves only informative directions identified via the spectral properties of hidden representations. RMT-based causal reduction is applied layer by layer with self-distillation to maintain stability and accuracy. On GLUE, AG News, and CIFAR-10, RMT-KD achieves up to 80% parameter reduction with only 2% accuracy loss, delivering 2.8x faster inference and nearly halved power consumption. These results establish RMT-KD as a mathematically grounded approach to network distillation.
Similar Papers
MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver
Machine Learning (CS)
Helps delivery trucks find best routes faster.
KD$^{2}$M: A unifying framework for feature knowledge distillation
Machine Learning (Stat)
Teaches computers to learn from other computers.
Pruning Deep Neural Networks via a Combination of the Marchenko-Pastur Distribution and Regularization
Machine Learning (CS)
Makes computer vision models smaller, faster.