Score: 0

Improving Implicit Hate Speech Detection via a Community-Driven Multi-Agent Framework

Published: January 14, 2026 | arXiv ID: 2601.09342v1

By: Ewelina Gajewska, Katarzyna Budzynska, Jarosław A Chudziak

This work proposes a contextualised detection framework for implicitly hateful speech, implemented as a multi-agent system comprising a central Moderator Agent and dynamically constructed Community Agents representing specific demographic groups. Our approach explicitly integrates socio-cultural context from publicly available knowledge sources, enabling identity-aware moderation that surpasses state-of-the-art prompting methods (zero-shot prompting, few-shot prompting, chain-of-thought prompting) and alternative approaches on a challenging ToxiGen dataset. We enhance the technical rigour of performance evaluation by incorporating balanced accuracy as a central metric of classification fairness that accounts for the trade-off between true positive and true negative rates. We demonstrate that our community-driven consultative framework significantly improves both classification accuracy and fairness across all target groups.

See, Explain, and Intervene: A Few-Shot Multimodal Agent Framework for Hateful Meme Moderation

Computation and Language

Stops mean online pictures before they spread.

8 Jan 2026 1

90%

Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation

Computation and Language

Stops online hate speech, even new words.

15 Feb 2025 2

90%

Leveraging LLMs for Context-Aware Implicit Textual and Multimodal Hate Speech Detection

Computation and Language

Helps computers spot hateful messages better.

17 Oct 2025 0

View PDF Login to Bookmark

Improving Implicit Hate Speech Detection via a Community-Driven Multi-Agent Framework

Technical Abstract

See, Explain, and Intervene: A Few-Shot Multimodal Agent Framework for Hateful Meme Moderation

Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation

Leveraging LLMs for Context-Aware Implicit Textual and Multimodal Hate Speech Detection