Score: 1

Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair

Published: March 25, 2025 | arXiv ID: 2503.20007v1

By: Maksim Borisov, Zhanibek Kozhirbayev, Valentin Malykh

Potential Business Impact:

Translates mixed languages without needing examples.

Business Areas:
Translation Service Professional Services

Machine translation for low resource language pairs is a challenging task. This task could become extremely difficult once a speaker uses code switching. We propose a method to build a machine translation model for code-switched Kazakh-Russian language pair with no labeled data. Our method is basing on generation of synthetic data. Additionally, we present the first codeswitching Kazakh-Russian parallel corpus and the evaluation results, which include a model achieving 16.48 BLEU almost reaching an existing commercial system and beating it by human evaluation.

Country of Origin
🇰🇿 🇷🇺 Russian Federation, Kazakhstan

Page Count
13 pages

Category
Computer Science:
Computation and Language