RegSpeech12: A Regional Corpus of Bengali Spontaneous Speech Across Dialects
By: Md. Rezuwan Hassan , Azmol Hossain , Kanij Fatema and more
Potential Business Impact:
Helps computers understand different Bengali accents.
The Bengali language, spoken extensively across South Asia and among diasporic communities, exhibits considerable dialectal diversity shaped by geography, culture, and history. Phonological and pronunciation-based classifications broadly identify five principal dialect groups: Eastern Bengali, Manbhumi, Rangpuri, Varendri, and Rarhi. Within Bangladesh, further distinctions emerge through variation in vocabulary, syntax, and morphology, as observed in regions such as Chittagong, Sylhet, Rangpur, Rajshahi, Noakhali, and Barishal. Despite this linguistic richness, systematic research on the computational processing of Bengali dialects remains limited. This study seeks to document and analyze the phonetic and morphological properties of these dialects while exploring the feasibility of building computational models particularly Automatic Speech Recognition (ASR) systems tailored to regional varieties. Such efforts hold potential for applications in virtual assistants and broader language technologies, contributing to both the preservation of dialectal diversity and the advancement of inclusive digital tools for Bengali-speaking communities. The dataset created for this study is released for public use.
Similar Papers
BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects
Computation and Language
Helps Bengali speakers use voice assistants in their own dialects.
Are ASR foundation models generalized enough to capture features of regional dialects for low-resource languages?
Computation and Language
Helps computers understand different accents of a language.
A Comparative Analysis of Retrieval-Augmented Generation Techniques for Bengali Standard-to-Dialect Machine Translation Using LLMs
Computation and Language
Helps computers translate between Bengali languages.