Conditions for Catastrophic Forgetting in Multilingual Translation
By: Danni Liu, Jan Niehues
Potential Business Impact:
Keeps AI smart in many languages.
Fine-tuning multilingual foundation models on specific languages often induces catastrophic forgetting, degrading performance on languages unseen in fine-tuning. While this phenomenon is widely-documented, the literature presents fragmented results about when forgetting occurs. To address this ambiguity, we conduct a systematic empirical study using machine translation as a testbed to identify the conditions that trigger catastrophic forgetting in multilingual fine-tuning. Through controlled experiments across different model architectures, data scales, and fine-tuning approaches, we reveal that the relative scale between model and data size is a primary determinant of forgetting. Moreover, we demonstrate that a model's instruction-following ability is more critical for retaining multilingual knowledge than its architecture. Contrary to assumptions, parameter-efficient fine-tuning offers no clear advantage over full fine-tuning in mitigating forgetting. Lastly, we show that cross-lingual alignment can mitigate forgetting while also facilitating positive transfer to unseen target languages.
Similar Papers
What Causes Knowledge Loss in Multilingual Language Models?
Computation and Language
Helps computers learn many languages without forgetting.
Catastrophic Forgetting in LLMs: A Comparative Analysis Across Language Tasks
Computation and Language
Keeps AI smart when learning new things.
Mitigating Catastrophic Forgetting in Continual Learning through Model Growth
Computation and Language
Keeps AI smart when learning new things.