Strategic Filtering for Content Moderation: Free Speech or Free of Distortion?
By: Saba Ahmadi , Avrim Blum , Haifeng Xu and more
Potential Business Impact:
Helps social media stop bad posts without blocking good ones.
User-generated content (UGC) on social media platforms is vulnerable to incitements and manipulations, necessitating effective regulations. To address these challenges, those platforms often deploy automated content moderators tasked with evaluating the harmfulness of UGC and filtering out content that violates established guidelines. However, such moderation inevitably gives rise to strategic responses from users, who strive to express themselves within the confines of guidelines. Such phenomena call for a careful balance between: 1. ensuring freedom of speech -- by minimizing the restriction of expression; and 2. reducing social distortion -- measured by the total amount of content manipulation. We tackle the problem of optimizing this balance through the lens of mechanism design, aiming at optimizing the trade-off between minimizing social distortion and maximizing free speech. Although determining the optimal trade-off is NP-hard, we propose practical methods to approximate the optimal solution. Additionally, we provide generalization guarantees determining the amount of finite offline data required to approximate the optimal moderator effectively.
Similar Papers
Content Moderation in TV Search: Balancing Policy Compliance, Relevance, and User Experience
Information Retrieval
Keeps search results from showing bad or wrong stuff.
Towards Safer Social Media Platforms: Scalable and Performant Few-Shot Harmful Content Moderation Using Large Language Models
Computation and Language
AI spots bad online posts better than humans.
Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing
Computation and Language
Shows why online posts get removed.