Yes-MT's Submission to the Low-Resource Indic Language Translation Shared Task in WMT 2024
By: Yash Bhaskar, Parameswari Krishnamurthy
Potential Business Impact:
Translates rare languages better with smart computer programs.
This paper presents the systems submitted by the Yes-MT team for the Low-Resource Indic Language Translation Shared Task at WMT 2024 (Pakray et al., 2024), focusing on translating between English and the Assamese, Mizo, Khasi, and Manipuri languages. The experiments explored various approaches, including fine-tuning pre-trained models like mT5 (Xue et al., 2020) and IndicBart (Dabre et al., 2021) in both multilingual and monolingual settings, LoRA (Hu et al., 2021) fine-tuning IndicTrans2 (Gala et al., 2023), zero-shot and few-shot prompting (Brown, 2020) with large language models (LLMs) like Llama 3 (Dubey et al., 2024) and Mixtral 8x7b (Jiang et al., 2024), LoRA supervised fine-tuning of Llama 3 (Mecklenburg et al., 2024), and training Transformer models (Vaswani, 2017) from scratch. The results were evaluated on the WMT23 Low-Resource Indic Language Translation Shared Task test data using SacreBLEU (Post, 2018) and CHRF (Popovic, 2015), highlighting the challenges of low-resource translation and the potential of LLMs for these tasks, particularly with fine-tuning.
Similar Papers
Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti
Computation and Language
Translates rare languages better than big AI.
Exploring Cross-Lingual Knowledge Transfer via Transliteration-Based MLM Fine-Tuning for Critically Low-resource Chakma Language
Computation and Language
Helps computers understand a rare language better.
LLMs for Low-Resource Dialect Translation Using Context-Aware Prompting: A Case Study on Sylheti
Computation and Language
Helps computers translate Sylheti dialect better.