Score: 0

Model compression using knowledge distillation with integrated gradients

Published: June 17, 2025 | arXiv ID: 2506.14440v1

By: David E. Hernandez, Jose Chang, Torbjörn E. M. Nordling

Potential Business Impact:

Makes smart computer programs smaller and faster.

Business Areas:

Image Recognition Data and Analytics, Software

Model compression is critical for deploying deep learning models on resource-constrained devices. We introduce a novel method enhancing knowledge distillation with integrated gradients (IG) as a data augmentation strategy. Our approach overlays IG maps onto input images during training, providing student models with deeper insights into teacher models' decision-making processes. Extensive evaluation on CIFAR-10 demonstrates that our IG-augmented knowledge distillation achieves 92.6% testing accuracy with a 4.1x compression factor-a significant 1.1 percentage point improvement ($p<0.001$) over non-distilled models (91.5%). This compression reduces inference time from 140 ms to 13 ms. Our method precomputes IG maps before training, transforming substantial runtime costs into a one-time preprocessing step. Our comprehensive experiments include: (1) comparisons with attention transfer, revealing complementary benefits when combined with our approach; (2) Monte Carlo simulations confirming statistical robustness; (3) systematic evaluation of compression factor versus accuracy trade-offs across a wide range (2.2x-1122x); and (4) validation on an ImageNet subset aligned with CIFAR-10 classes, demonstrating generalisability beyond the initial dataset. These extensive ablation studies confirm that IG-based knowledge distillation consistently outperforms conventional approaches across varied architectures and compression ratios. Our results establish this framework as a viable compression technique for real-world deployment on edge devices while maintaining competitive accuracy.

Knowledge Distillation: Enhancing Neural Network Compression with Integrated Gradients

Machine Learning (CS)

Makes smart computer programs run much faster.

17 Mar 2025 0

89%

Efficient Learned Image Compression Through Knowledge Distillation

CV and Pattern Recognition

Makes AI image compression faster and use less power.

12 Sep 2025 1

88%

Lightweight Task-Oriented Semantic Communication Empowered by Large-Scale AI Models

Machine Learning (CS)

Makes AI communication faster and smarter.

16 Jun 2025 1

View PDF Login to Bookmark

Page Count

49 pages

Model compression using knowledge distillation with integrated gradients

Makes smart computer programs smaller and faster.

Technical Abstract

Knowledge Distillation: Enhancing Neural Network Compression with Integrated Gradients

Efficient Learned Image Compression Through Knowledge Distillation

Lightweight Task-Oriented Semantic Communication Empowered by Large-Scale AI Models