Score: 0

Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory

Published: December 17, 2025 | arXiv ID: 2512.15267v1

By: Huiyan Xue , Xuming Ran , Yaxin Li and more

Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures such as Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks via Top-K activation and have shown resilience against catastrophic forgetting. However, their rigid modularity limits cross-task knowledge reuse and leads to performance degradation under high sparsity. We propose Selective Subnetwork Distillation (SSD), a structurally guided continual learning framework that treats distillation not as a regularizer but as a topology-aligned information conduit. SSD identifies neurons with high activation frequency and selectively distills knowledge within previous Top-K subnetworks and output logits, without requiring replay or task labels. This enables structural realignment while preserving sparse modularity. Experiments on Split CIFAR-10, CIFAR-100, and MNIST demonstrate that SSD improves accuracy, retention, and representation coverage, offering a structurally grounded solution for sparse continual learning.

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation

Machine Learning (CS)

Makes one computer model smarter than many.

19 Apr 2025 2

88%

Diffusion As Self-Distillation: End-to-End Latent Diffusion In One Model

CV and Pattern Recognition

Makes AI create pictures faster and better.

18 Nov 2025 0

88%

An Efficient GNNs-to-KANs Distillation via Self-Attention Dynamic Sampling with Potential for Consumer Electronics Edge Deployment

Machine Learning (CS)

Makes small computers learn like big ones faster.

30 Aug 2025 0

View PDF Login to Bookmark

Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory

Technical Abstract

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation

Diffusion As Self-Distillation: End-to-End Latent Diffusion In One Model

An Efficient GNNs-to-KANs Distillation via Self-Attention Dynamic Sampling with Potential for Consumer Electronics Edge Deployment