Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory
By: Huiyan Xue , Xuming Ran , Yaxin Li and more
Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures such as Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks via Top-K activation and have shown resilience against catastrophic forgetting. However, their rigid modularity limits cross-task knowledge reuse and leads to performance degradation under high sparsity. We propose Selective Subnetwork Distillation (SSD), a structurally guided continual learning framework that treats distillation not as a regularizer but as a topology-aligned information conduit. SSD identifies neurons with high activation frequency and selectively distills knowledge within previous Top-K subnetworks and output logits, without requiring replay or task labels. This enables structural realignment while preserving sparse modularity. Experiments on Split CIFAR-10, CIFAR-100, and MNIST demonstrate that SSD improves accuracy, retention, and representation coverage, offering a structurally grounded solution for sparse continual learning.
Similar Papers
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation
Machine Learning (CS)
Makes one computer model smarter than many.
Diffusion As Self-Distillation: End-to-End Latent Diffusion In One Model
CV and Pattern Recognition
Makes AI create pictures faster and better.
An Efficient GNNs-to-KANs Distillation via Self-Attention Dynamic Sampling with Potential for Consumer Electronics Edge Deployment
Machine Learning (CS)
Makes small computers learn like big ones faster.