HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
By: Haiyang Guo , Fanhu Zeng , Ziwei Xiang and more
Potential Business Impact:
Keeps AI smart as it learns new tasks.
Instruction tuning is widely used to improve a pre-trained Multimodal Large Language Model (MLLM) by training it on curated task-specific datasets, enabling better comprehension of human instructions. However, it is infeasible to collect all possible instruction datasets simultaneously in real-world scenarios. Thus, enabling MLLM with continual instruction tuning is essential for maintaining their adaptability. However, existing methods often trade off memory efficiency for performance gains, significantly compromising overall efficiency. In this paper, we propose a task-specific expansion and task-general fusion framework based on the variations in Centered Kernel Alignment (CKA) similarity across different model layers when trained on diverse datasets. Furthermore, we analyze the information leakage present in the existing benchmark and propose a new and more challenging benchmark to rationally evaluate the performance of different methods. Comprehensive experiments showcase a significant performance improvement of our method compared to existing state-of-the-art methods. Code and dataset are released at https://github.com/Ghy0501/HiDe-LLaVA.
Similar Papers
Towards Alignment-Centric Paradigm: A Survey of Instruction Tuning in Large Language Models
Computation and Language
Teaches AI to follow instructions better.
Towards Automatic Continual Learning: A Self-Adaptive Framework for Continual Instruction Tuning
Computation and Language
Teaches AI new things without forgetting old ones.
LLaVA-c: Continual Improved Visual Instruction Tuning
CV and Pattern Recognition
Teaches AI to learn new things without forgetting.