LLaVA-NeuMT: Selective Layer-Neuron Modulation for Efficient Multilingual Multimodal Translation
By: Jingxuan Wei , Caijun Jia , Qi Chen and more
Potential Business Impact:
Translates many languages better with pictures.
Multimodal Machine Translation (MMT) enhances translation quality by incorporating visual context, helping to resolve textual ambiguities. While existing MMT methods perform well in bilingual settings, extending them to multilingual translation remains challenging due to cross-lingual interference and ineffective parameter-sharing strategies. To address this, we propose LLaVA-NeuMT, a novel multimodal multilingual translation framework that explicitly models language-specific and language-agnostic representations to mitigate multilingual interference. Our approach consists of a layer selection mechanism that identifies the most informative layers for different language pairs and a neuron-level adaptation strategy that dynamically selects language-specific and agnostic neurons to improve translation quality while reducing redundancy. We conduct extensive experiments on the M3-Multi30K and M3-AmbigCaps datasets, demonstrating that LLaVA-NeuMT, while fine-tuning only 40\% of the model parameters, surpasses full fine-tuning approaches and ultimately achieves SOTA results on both datasets. Our analysis further provides insights into the importance of selected layers and neurons in multimodal multilingual adaptation, offering an efficient and scalable solution to cross-lingual adaptation in multimodal translation.
Similar Papers
Language-Specific Layer Matters: Efficient Multilingual Enhancement for Large Vision-Language Models
Computation and Language
Makes AI understand many languages better.
Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs
Computation and Language
Translates 60 languages better, even Chinese.
How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective
Computation and Language
Helps computers learn many languages better.