Multi-task Learning with Active Learning for Arabic Offensive Speech Detection
By: Aisha Alansari, Hamzah Luqman
Potential Business Impact:
Finds bad words online better, even with little data.
The rapid growth of social media has amplified the spread of offensive, violent, and vulgar speech, which poses serious societal and cybersecurity concerns. Detecting such content in Arabic text is particularly complex due to limited labeled data, dialectal variations, and the language's inherent complexity. This paper proposes a novel framework that integrates multi-task learning (MTL) with active learning to enhance offensive speech detection in Arabic social media text. By jointly training on two auxiliary tasks, violent and vulgar speech, the model leverages shared representations to improve the detection accuracy of the offensive speech. Our approach dynamically adjusts task weights during training to balance the contribution of each task and optimize performance. To address the scarcity of labeled data, we employ an active learning strategy through several uncertainty sampling techniques to iteratively select the most informative samples for model training. We also introduce weighted emoji handling to better capture semantic cues. Experimental results on the OSACT2022 dataset show that the proposed framework achieves a state-of-the-art macro F1-score of 85.42%, outperforming existing methods while using significantly fewer fine-tuning samples. The findings of this study highlight the potential of integrating MTL with active learning for efficient and accurate offensive language detection in resource-constrained settings.
Similar Papers
A Multi-Task Benchmark for Abusive Language Detection in Low-Resource Settings
Computation and Language
Helps Tigrinya speakers fight online hate speech.
LLM-Based Multi-Task Bangla Hate Speech Detection: Type, Severity, and Target
Computation and Language
Helps stop online hate speech in Bangla.
Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models
Computation and Language
Stops online hate speech in Urdu, English, and Spanish.