Score: 1

HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model

Published: March 17, 2025 | arXiv ID: 2503.12941v2

By: Haiyang Guo , Fanhu Zeng , Ziwei Xiang and more

Potential Business Impact:

Keeps AI smart as it learns new tasks.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Instruction tuning is widely used to improve a pre-trained Multimodal Large Language Model (MLLM) by training it on curated task-specific datasets, enabling better comprehension of human instructions. However, it is infeasible to collect all possible instruction datasets simultaneously in real-world scenarios. Thus, enabling MLLM with continual instruction tuning is essential for maintaining their adaptability. However, existing methods often trade off memory efficiency for performance gains, significantly compromising overall efficiency. In this paper, we propose a task-specific expansion and task-general fusion framework based on the variations in Centered Kernel Alignment (CKA) similarity across different model layers when trained on diverse datasets. Furthermore, we analyze the information leakage present in the existing benchmark and propose a new and more challenging benchmark to rationally evaluate the performance of different methods. Comprehensive experiments showcase a significant performance improvement of our method compared to existing state-of-the-art methods. Code and dataset are released at https://github.com/Ghy0501/HiDe-LLaVA.

Towards Alignment-Centric Paradigm: A Survey of Instruction Tuning in Large Language Models

Computation and Language

Teaches AI to follow instructions better.

24 Aug 2025 1

89%

Towards Automatic Continual Learning: A Self-Adaptive Framework for Continual Instruction Tuning

Computation and Language

Teaches AI new things without forgetting old ones.

20 Mar 2025 0

89%

LLaVA-c: Continual Improved Visual Instruction Tuning

CV and Pattern Recognition

Teaches AI to learn new things without forgetting.

10 Jun 2025 1

View PDF Login to Bookmark

Page Count

15 pages

HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model

Keeps AI smart as it learns new tasks.

Technical Abstract

Towards Alignment-Centric Paradigm: A Survey of Instruction Tuning in Large Language Models

Towards Automatic Continual Learning: A Self-Adaptive Framework for Continual Instruction Tuning

LLaVA-c: Continual Improved Visual Instruction Tuning