Score: 0

Advancing Bangla Machine Translation Through Informal Datasets

Published: December 15, 2025 | arXiv ID: 2512.13487v1

By: Ayon Roy , Risat Rahaman , Sadat Shibly and more

Potential Business Impact:

Lets millions understand online info in Bangla.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Bangla is the sixth most widely spoken language globally, with approximately 234 million native speakers. However, progress in open-source Bangla machine translation remains limited. Most online resources are in English and often remain untranslated into Bangla, excluding millions from accessing essential information. Existing research in Bangla translation primarily focuses on formal language, neglecting the more commonly used informal language. This is largely due to the lack of pairwise Bangla-English data and advanced translation models. If datasets and models can be enhanced to better handle natural, informal Bangla, millions of people will benefit from improved online information access. In this research, we explore current state-of-the-art models and propose improvements to Bangla translation by developing a dataset from informal sources like social media and conversational texts. This work aims to advance Bangla machine translation by focusing on informal language translation and improving accessibility for Bangla speakers in the digital world.