Score: 1

Just Go Parallel: Improving the Multilingual Capabilities of Large Language Models

Published: June 16, 2025 | arXiv ID: 2506.13044v1

By: Muhammad Reza Qorib, Junyi Li, Hwee Tou Ng

Potential Business Impact:

Adds more languages to computer translators.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) have demonstrated impressive translation capabilities even without being explicitly trained on parallel data. This remarkable property has led some to believe that parallel data is no longer necessary for building multilingual language models. While some attribute this to the emergent abilities of LLMs due to scale, recent work suggests that it is actually caused by incidental bilingual signals present in the training data. Various methods have been proposed to maximize the utility of parallel data to enhance the multilingual capabilities of multilingual encoder-based and encoder-decoder language models. However, some decoder-based LLMs opt to ignore parallel data instead. In this work, we conduct a systematic study on the impact of adding parallel data on LLMs' multilingual capabilities, focusing specifically on translation and multilingual common-sense reasoning. Through controlled experiments, we demonstrate that parallel data can significantly improve LLMs' multilingual capabilities.

Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders

Computation and Language

Helps computers understand images in many languages.

30 Apr 2025 0

91%

Massively Multilingual Adaptation of Large Language Models Using Bilingual Translation Data

Computation and Language

Helps computers understand many more languages.

31 May 2025 2

91%

The Role of Mixed-Language Documents for Multilingual Large Language Model Pretraining

Computation and Language

Makes computers translate languages better with specific data.

1 Jan 2026 0

View PDF Login to Bookmark

Country of Origin

🇸🇬 Singapore

Repos / Data Links

github.com github.com

Page Count

14 pages

Just Go Parallel: Improving the Multilingual Capabilities of Large Language Models

Adds more languages to computer translators.

Technical Abstract

Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders

Massively Multilingual Adaptation of Large Language Models Using Bilingual Translation Data

The Role of Mixed-Language Documents for Multilingual Large Language Model Pretraining