HAM: Hierarchical Adapter Merging for Scalable Continual Learning
By: Eric Nuertey Coleman , Luigi Quarantiello , Samrat Mukherjee and more
Potential Business Impact:
Helps computers learn new things without forgetting old ones.
Continual learning is an essential capability of human cognition, yet it poses significant challenges for current deep learning models. The primary issue is that new knowledge can interfere with previously learned information, causing the model to forget earlier knowledge in favor of the new, a phenomenon known as catastrophic forgetting. Although large pre-trained models can partially mitigate forgetting by leveraging their existing knowledge and over-parameterization, they often struggle when confronted with novel data distributions. Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA, enable efficient adaptation to new knowledge. However, they still face challenges in scaling to dynamic learning scenarios and long sequences of tasks, as maintaining one adapter per task introduces complexity and increases the potential for interference. In this paper, we introduce Hierarchical Adapters Merging (HAM), a novel framework that dynamically combines adapters from different tasks during training. This approach enables HAM to scale effectively, allowing it to manage more tasks than competing baselines with improved efficiency. To achieve this, HAM maintains a fixed set of groups that hierarchically consolidate new adapters. For each task, HAM trains a low-rank adapter along with an importance scalar, then dynamically groups tasks based on adapter similarity. Within each group, adapters are pruned, scaled and merge, facilitating transfer learning between related tasks. Extensive experiments on three vision benchmarks show that HAM significantly outperforms state-of-the-art methods, particularly as the number of tasks increases.
Similar Papers
HAM: Hierarchical Adapter Merging for Scalable Continual Learning
Machine Learning (CS)
Helps computers learn new things without forgetting old ones.
Continual Learning in Vision-Language Models via Aligned Model Merging
CV and Pattern Recognition
Keeps computer memory from forgetting old lessons.
Parameter Efficient Continual Learning with Dynamic Low-Rank Adaptation
Machine Learning (CS)
Teaches computers new things without forgetting old ones.