Score: 1

Bangla Hate Speech Classification with Fine-tuned Transformer Models

Published: December 2, 2025 | arXiv ID: 2512.02845v1

By: Yalda Keivan Jafari, Krishno Dey

Potential Business Impact:

Helps computers find hate speech in Bengali.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Hate speech recognition in low-resource languages remains a difficult problem due to insufficient datasets, orthographic heterogeneity, and linguistic variety. Bangla is spoken by more than 230 million people of Bangladesh and India (West Bengal). Despite the growing need for automated moderation on social media platforms, Bangla is significantly under-represented in computational resources. In this work, we study Subtask 1A and Subtask 1B of the BLP 2025 Shared Task on hate speech detection. We reproduce the official baselines (e.g., Majority, Random, Support Vector Machine) and also produce and consider Logistic Regression, Random Forest, and Decision Tree as baseline methods. We also utilized transformer-based models such as DistilBERT, BanglaBERT, m-BERT, and XLM-RoBERTa for hate speech classification. All the transformer-based models outperformed baseline methods for the subtasks, except for DistilBERT. Among the transformer-based models, BanglaBERT produces the best performance for both subtasks. Despite being smaller in size, BanglaBERT outperforms both m-BERT and XLM-RoBERTa, which suggests language-specific pre-training is very important. Our results highlight the potential and need for pre-trained language models for the low-resource Bangla language.

Country of Origin
🇨🇦 Canada

Repos / Data Links

Page Count
10 pages

Category
Computer Science:
Computation and Language