ToxicTAGS: Decoding Toxic Memes with Rich Tag Annotations
By: Subhankar Swain , Naquee Rizwan , Nayandeep Deb and more
Potential Business Impact:
Helps stop mean memes online.
The 2025 Global Risks Report identifies state-based armed conflict and societal polarisation among the most pressing global threats, with social media playing a central role in amplifying toxic discourse. Memes, as a widely used mode of online communication, often serve as vehicles for spreading harmful content. However, limitations in data accessibility and the high cost of dataset curation hinder the development of robust meme moderation systems. To address this challenge, in this work, we introduce a first-of-its-kind dataset of 6,300 real-world meme-based posts annotated in two stages: (i) binary classification into toxic and normal, and (ii) fine-grained labelling of toxic memes as hateful, dangerous, or offensive. A key feature of this dataset is that it is enriched with auxiliary metadata of socially relevant tags, enhancing the context of each meme. In addition, we propose a tag generation module that produces socially grounded tags, because most in-the-wild memes often do not come with tags. Experimental results show that incorporating these tags substantially enhances the performance of state-of-the-art VLMs detection tasks. Our contributions offer a novel and scalable foundation for improved content moderation in multimodal online environments.
Similar Papers
Detecting and Understanding Hateful Contents in Memes Through Captioning and Visual Question-Answering
CV and Pattern Recognition
Finds hate hidden in pictures and words.
Defining, Understanding, and Detecting Online Toxicity: Challenges and Machine Learning Approaches
Computation and Language
Finds and stops bad online words.
Mapping Toxic Comments Across Demographics: A Dataset from German Public Broadcasting
Computation and Language
Helps online spaces understand age differences in bad talk.