Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair
By: Maksim Borisov, Zhanibek Kozhirbayev, Valentin Malykh
Potential Business Impact:
Translates mixed languages without needing examples.
Machine translation for low resource language pairs is a challenging task. This task could become extremely difficult once a speaker uses code switching. We propose a method to build a machine translation model for code-switched Kazakh-Russian language pair with no labeled data. Our method is basing on generation of synthetic data. Additionally, we present the first codeswitching Kazakh-Russian parallel corpus and the evaluation results, which include a model achieving 16.48 BLEU almost reaching an existing commercial system and beating it by human evaluation.
Similar Papers
End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data
Computation and Language
Translates speech for languages with little data.
Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation
Computation and Language
Teaches computers to understand a rare Ukrainian language.
Minimal Pair-Based Evaluation of Code-Switching
Computation and Language
Helps computers understand how people switch languages.