Score: 0

Head-Tail-Aware KL Divergence in Knowledge Distillation for Spiking Neural Networks

Published: April 29, 2025 | arXiv ID: 2504.20445v2

By: Tianqing Zhang , Zixin Zhu , Kairong Yu and more

Potential Business Impact:

Teaches computer brains to learn better, faster.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Spiking Neural Networks (SNNs) have emerged as a promising approach for energy-efficient and biologically plausible computation. However, due to limitations in existing training methods and inherent model constraints, SNNs often exhibit a performance gap when compared to Artificial Neural Networks (ANNs). Knowledge distillation (KD) has been explored as a technique to transfer knowledge from ANN teacher models to SNN student models to mitigate this gap. Traditional KD methods typically use Kullback-Leibler (KL) divergence to align output distributions. However, conventional KL-based approaches fail to fully exploit the unique characteristics of SNNs, as they tend to overemphasize high-probability predictions while neglecting low-probability ones, leading to suboptimal generalization. To address this, we propose Head-Tail Aware Kullback-Leibler (HTA-KL) divergence, a novel KD method for SNNs. HTA-KL introduces a cumulative probability-based mask to dynamically distinguish between high- and low-probability regions. It assigns adaptive weights to ensure balanced knowledge transfer, enhancing the overall performance. By integrating forward KL (FKL) and reverse KL (RKL) divergence, our method effectively align both head and tail regions of the distribution. We evaluate our methods on CIFAR-10, CIFAR-100 and Tiny ImageNet datasets. Our method outperforms existing methods on most datasets with fewer timesteps.

Optimizing Knowledge Distillation in Transformers: Enabling Multi-Head Attention without Alignment Barriers

CV and Pattern Recognition

Lets smaller AI learn from bigger AI.

11 Feb 2025 1

88%

An Efficient GNNs-to-KANs Distillation via Self-Attention Dynamic Sampling with Potential for Consumer Electronics Edge Deployment

Machine Learning (CS)

Makes small computers learn like big ones faster.

30 Aug 2025 0

88%

Cross Knowledge Distillation between Artificial and Spiking Neural Networks

CV and Pattern Recognition

Teaches computers to learn better from different data.

12 Jul 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

9 pages

Head-Tail-Aware KL Divergence in Knowledge Distillation for Spiking Neural Networks

Teaches computer brains to learn better, faster.

Technical Abstract

Optimizing Knowledge Distillation in Transformers: Enabling Multi-Head Attention without Alignment Barriers

An Efficient GNNs-to-KANs Distillation via Self-Attention Dynamic Sampling with Potential for Consumer Electronics Edge Deployment

Cross Knowledge Distillation between Artificial and Spiking Neural Networks