BanglaMM-Disaster: A Multimodal Transformer-Based Deep Learning Framework for Multiclass Disaster Classification in Bangla
By: Ariful Islam , Md Rifat Hossen , Md. Mahmudul Arif and more
Potential Business Impact:
Helps predict disasters faster using text and pictures.
Natural disasters remain a major challenge for Bangladesh, so real-time monitoring and quick response systems are essential. In this study, we present BanglaMM-Disaster, an end-to-end deep learning-based multimodal framework for disaster classification in Bangla, using both textual and visual data from social media. We constructed a new dataset of 5,037 Bangla social media posts, each consisting of a caption and a corresponding image, annotated into one of nine disaster-related categories. The proposed model integrates transformer-based text encoders, including BanglaBERT, mBERT, and XLM-RoBERTa, with CNN backbones such as ResNet50, DenseNet169, and MobileNetV2, to process the two modalities. Using early fusion, the best model achieves 83.76% accuracy. This surpasses the best text-only baseline by 3.84% and the image-only baseline by 16.91%. Our analysis also shows reduced misclassification across all classes, with noticeable improvements for ambiguous examples. This work fills a key gap in Bangla multimodal disaster analysis and demonstrates the benefits of combining multiple data types for real-time disaster response in low-resource settings.
Similar Papers
Transformer-Driven Triple Fusion Framework for Enhanced Multimodal Author Intent Classification in Low-Resource Bangla
Machine Learning (CS)
Helps computers understand what people mean online.
Contextual Attention-Based Multimodal Fusion of LLM and CNN for Sentiment Analysis
Machine Learning (CS)
Helps understand feelings during disasters from posts.
Comparative Analysis of Transformer Models in Disaster Tweet Classification for Public Safety
Computation and Language
Helps emergency services find disaster tweets faster.