Advancing Bangla Machine Translation Through Informal Datasets
By: Ayon Roy , Risat Rahaman , Sadat Shibly and more
Potential Business Impact:
Lets millions understand online info in Bangla.
Bangla is the sixth most widely spoken language globally, with approximately 234 million native speakers. However, progress in open-source Bangla machine translation remains limited. Most online resources are in English and often remain untranslated into Bangla, excluding millions from accessing essential information. Existing research in Bangla translation primarily focuses on formal language, neglecting the more commonly used informal language. This is largely due to the lack of pairwise Bangla-English data and advanced translation models. If datasets and models can be enhanced to better handle natural, informal Bangla, millions of people will benefit from improved online information access. In this research, we explore current state-of-the-art models and propose improvements to Bangla translation by developing a dataset from informal sources like social media and conversational texts. This work aims to advance Bangla machine translation by focusing on informal language translation and improving accessibility for Bangla speakers in the digital world.
Similar Papers
Stemming -- The Evolution and Current State with a Focus on Bangla
Computation and Language
Helps computers understand Bengali words better.
Bangla Hate Speech Classification with Fine-tuned Transformer Models
Computation and Language
Helps computers find hate speech in Bengali.
BanglaSTEM: A Parallel Corpus for Technical Domain Bangla-English Translation
Computation and Language
Helps computers understand science questions in Bangla.