SynHate: Detecting Hate Speech in Synthetic Deepfake Audio
By: Rishabh Ranjan , Kishan Pipariya , Mayank Vatsa and more
Potential Business Impact:
Finds fake hate speech in any language.
The rise of deepfake audio and hate speech, powered by advanced text-to-speech, threatens online safety. We present SynHate, the first multilingual dataset for detecting hate speech in synthetic audio, spanning 37 languages. SynHate uses a novel four-class scheme: Real-normal, Real-hate, Fake-normal, and Fake-hate. Built from MuTox and ADIMA datasets, it captures diverse hate speech patterns globally and in India. We evaluate five leading self-supervised models (Whisper-small/medium, XLS-R, AST, mHuBERT), finding notable performance differences by language, with Whisper-small performing best overall. Cross-dataset generalization remains a challenge. By releasing SynHate and baseline code, we aim to advance robust, culturally sensitive, and multilingual solutions against synthetic hate speech. The dataset is available at https://www.iab-rubric.org/resources.
Similar Papers
Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages
Sound
Finds hate speech in fake voices, even new ones.
A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English
Computation and Language
Helps computers find different kinds of online hate.
Advancing Hate Speech Detection with Transformers: Insights from the MetaHate
Machine Learning (CS)
Finds mean online words faster than before.