BECAME: BayEsian Continual Learning with Adaptive Model MErging
By: Mei Li , Yuxiang Lu , Qinyan Dai and more
Potential Business Impact:
Helps computers remember old and new lessons.
Continual Learning (CL) strives to learn incrementally across tasks while mitigating catastrophic forgetting. A key challenge in CL is balancing stability (retaining prior knowledge) and plasticity (learning new tasks). While representative gradient projection methods ensure stability, they often limit plasticity. Model merging techniques offer promising solutions, but prior methods typically rely on empirical assumptions and carefully selected hyperparameters. In this paper, we explore the potential of model merging to enhance the stability-plasticity trade-off, providing theoretical insights that underscore its benefits. Specifically, we reformulate the merging mechanism using Bayesian continual learning principles and derive a closed-form solution for the optimal merging coefficient that adapts to the diverse characteristics of tasks. To validate our approach, we introduce a two-stage framework named BECAME, which synergizes the expertise of gradient projection and adaptive merging. Extensive experiments show that our approach outperforms state-of-the-art CL methods and existing merging strategies.
Similar Papers
Continual Learning in Vision-Language Models via Aligned Model Merging
CV and Pattern Recognition
Keeps computer memory from forgetting old lessons.
Continual learning via probabilistic exchangeable sequence modelling
Machine Learning (Stat)
Teaches computers new things without forgetting old ones.
Adapt before Continual Learning
Machine Learning (CS)
Teaches computers to learn new things without forgetting.