BIDWESH: A Bangla Regional Based Hate Speech Detection Dataset
By: Azizul Hakim Fayaz , MD. Shorif Uddin , Rayhan Uddin Bhuiyan and more
Potential Business Impact:
Helps online sites find hate speech in all Bangla dialects.
Hate speech on digital platforms has become a growing concern globally, especially in linguistically diverse countries like Bangladesh, where regional dialects play a major role in everyday communication. Despite progress in hate speech detection for standard Bangla, Existing datasets and systems fail to address the informal and culturally rich expressions found in dialects such as Barishal, Noakhali, and Chittagong. This oversight results in limited detection capability and biased moderation, leaving large sections of harmful content unaccounted for. To address this gap, this study introduces BIDWESH, the first multi-dialectal Bangla hate speech dataset, constructed by translating and annotating 9,183 instances from the BD-SHS corpus into three major regional dialects. Each entry was manually verified and labeled for hate presence, type (slander, gender, religion, call to violence), and target group (individual, male, female, group), ensuring linguistic and contextual accuracy. The resulting dataset provides a linguistically rich, balanced, and inclusive resource for advancing hate speech detection in Bangla. BIDWESH lays the groundwork for the development of dialect-sensitive NLP tools and contributes significantly to equitable and context-aware content moderation in low-resource language settings.
Similar Papers
BOISHOMMO: Holistic Approach for Bangla Hate Speech
Machine Learning (CS)
Helps computers spot online hate speech in Bangla.
BengaliSent140: A Large-Scale Bengali Binary Sentiment Dataset for Hate and Non-Hate Speech Classification
Computation and Language
Helps computers understand angry Bengali words better.
LLM-Based Multi-Task Bangla Hate Speech Detection: Type, Severity, and Target
Computation and Language
Helps stop online hate speech in Bangla.