Score: 0

Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning

Published: March 26, 2025 | arXiv ID: 2503.20633v1

By: Sashuai Zhou, Hai Huang, Yan Xia

Potential Business Impact:

Makes AI understand different kinds of information together.

Business Areas:

EdTech Education, Software

Multi-modal models excel in cross-modal tasks but are computationally expensive due to their billions of parameters. Parameter-efficient fine-tuning (PEFT) offers a solution by adding small trainable components while freezing pre-trained parameters. However, existing methods primarily focus on uni-modal processing, overlooking the critical modal fusion needed for multi-modal tasks. To fill this gap, we propose heterogeneous mixture of experts adapters that extend the traditional PEFT framework to support multi-modal expert combinations and improve information interaction. Additionally, our approach modifies the affine linear expert design to enable efficient modal fusion in a low-rank space, achieving competitive performance with only 5-8\% of the parameters fine-tuned. Experiments across eight downstream tasks, including visual-audio and text-visual, demonstrate the superior performance of the approach.

Parameter-Efficient Routed Fine-Tuning: Mixture-of-Experts Demands Mixture of Adaptation Modules

Machine Learning (CS)

Makes AI smarter by letting it pick the best brain part.

4 Aug 2025 0

90%

PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models

Computation and Language

Makes big AI models learn new things cheaply.

19 Apr 2025 1

90%

MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models

Computation and Language

Makes AI smarter by mixing different learning parts.

6 Jun 2025 1

View PDF Login to Bookmark

Page Count

6 pages

Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning

Makes AI understand different kinds of information together.

Technical Abstract

Parameter-Efficient Routed Fine-Tuning: Mixture-of-Experts Demands Mixture of Adaptation Modules

PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models

MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models