How to Teach Large Multimodal Models New Skills
By: Zhen Zhu , Yiming Gong , Yao Xiao and more
Potential Business Impact:
Teaches AI new things without forgetting old ones.
How can we teach large multimodal models (LMMs) new skills without erasing prior abilities? We study sequential fine-tuning on five target skills while monitoring general ability on eight held-out benchmarks across three model families. We observe that apparent "forgetting" on held-out tasks after narrow fine-tuning can partly recover at later stages. We trace this behavior to a measurable shift in the output token distribution, manifested through a simple counting-bias probe that co-varies with forgetting. Guided by this picture, we identify two simple, robust tuning recipes that learn strongly while limiting drift: (i) updating only the self-attention projection layers, and (ii) updating only the MLP Gate&Up while freezing the Down projection. Across models and tasks, these choices deliver strong target gains while largely preserving held-out performance. Code is available at https://github.com/jessemelpolio/LMM_CL
Similar Papers
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model
Computation and Language
Teaches AI to understand pictures and words better.
When Continue Learning Meets Multimodal Large Language Model: A Survey
Machine Learning (CS)
Helps AI learn new things without forgetting old ones.
Continual Learning for Generative AI: From LLMs to MLLMs and Beyond
Machine Learning (CS)
AI remembers old lessons while learning new ones.