On Understanding of the Dynamics of Model Capacity in Continual Learning
By: Supriyo Chakraborty, Krishnan Raghavan
Potential Business Impact:
Helps computers learn new things without forgetting old ones.
The stability-plasticity dilemma, closely related to a neural network's (NN) capacity-its ability to represent tasks-is a fundamental challenge in continual learning (CL). Within this context, we introduce CL's effective model capacity (CLEMC) that characterizes the dynamic behavior of the stability-plasticity balance point. We develop a difference equation to model the evolution of the interplay between the NN, task data, and optimization procedure. We then leverage CLEMC to demonstrate that the effective capacity-and, by extension, the stability-plasticity balance point is inherently non-stationary. We show that regardless of the NN architecture or optimization method, a NN's ability to represent new tasks diminishes when incoming task distributions differ from previous ones. We conduct extensive experiments to support our theoretical findings, spanning a range of architectures-from small feedforward network and convolutional networks to medium-sized graph neural networks and transformer-based large language models with millions of parameters.
Similar Papers
On Understanding of the Dynamics of Model Capacity in Continual Learning
Machine Learning (CS)
Helps computers learn new things without forgetting old ones.
Dynamic Mixture of Experts Against Severe Distribution Shifts
Machine Learning (CS)
Lets computers learn new things without forgetting old ones.
Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective
Machine Learning (CS)
Helps computers learn new things without forgetting old ones.