The Big Ban Theory: A Pre- and Post-Intervention Dataset of Online Content Moderation Actions
By: Aldo Cerulli , Lorenzo Cima , Benedetta Tessa and more
Potential Business Impact:
Helps online sites remove bad posts fairly.
Online platforms rely on moderation interventions to curb harmful behavior such hate speech, toxicity, and the spread of mis- and disinformation. Yet research on the effects and possible biases of such interventions faces multiple limitations. For example, existing works frequently focus on single or a few interventions, due to the absence of comprehensive datasets. As a result, researchers must typically collect the necessary data for each new study, which limits opportunities for systematic comparisons. To overcome these challenges, we introduce The Big Ban Theory (TBBT), a large dataset of moderation interventions. TBBT covers 25 interventions of varying type, severity, and scope, comprising in total over 339K users and nearly 39M posted messages. For each intervention, we provide standardized metadata and pseudonymized user activity collected three months before and after its enforcement, enabling consistent and comparable analyses of intervention effects. In addition, we provide a descriptive exploratory analysis of the dataset, along with several use cases of how it can support research on content moderation. With this dataset, we aim to support researchers studying the effects of moderation interventions and to promote more systematic, reproducible, and comparable research. TBBT is publicly available at: https://doi.org/10.5281/zenodo.18245670.
Similar Papers
Evaluating Moderation in Online Social Network
Social and Information Networks
Makes online platforms stop bad messages faster.
Modelling the Spread of Toxicity and Exploring its Mitigation on Online Social Networks
Social and Information Networks
Bots reduce online hate speech by changing its message.
Before the Outrage: Challenges and Advances in Predicting Online Antisocial Behavior
Computation and Language
Stops online meanness before it starts.