Evaluating Online Moderation Via LLM-Powered Counterfactual Simulations
By: Giacomo Fidone, Lucia Passaro, Riccardo Guidotti
Potential Business Impact:
Tests how to stop online meanness better.
Online Social Networks (OSNs) widely adopt content moderation to mitigate the spread of abusive and toxic discourse. Nonetheless, the real effectiveness of moderation interventions remains unclear due to the high cost of data collection and limited experimental control. The latest developments in Natural Language Processing pave the way for a new evaluation approach. Large Language Models (LLMs) can be successfully leveraged to enhance Agent-Based Modeling and simulate human-like social behavior with unprecedented degree of believability. Yet, existing tools do not support simulation-based evaluation of moderation strategies. We fill this gap by designing a LLM-powered simulator of OSN conversations enabling a parallel, counterfactual simulation where toxic behavior is influenced by moderation interventions, keeping all else equal. We conduct extensive experiments, unveiling the psychological realism of OSN agents, the emergence of social contagion phenomena and the superior effectiveness of personalized moderation strategies.
Similar Papers
Simulating Online Social Media Conversations on Controversial Topics Using AI Agents Calibrated on Real-World Data
Social and Information Networks
Computers can now pretend to be people online.
Social Simulations with Large Language Model Risk Utopian Illusion
Computation and Language
Computers show fake, too-nice people in chats.
Are LLM-Powered Social Media Bots Realistic?
Social and Information Networks
Shows AI bots differ from real users online