Toxicity Ahead: Forecasting Conversational Derailment on GitHub
By: Mia Mohammad Imran , Robert Zita , Rahat Rizvi Rahman and more
Potential Business Impact:
Finds mean online chats before they start.
Toxic interactions in Open Source Software (OSS) communities reduce contributor engagement and threaten project sustainability. Preventing such toxicity before it emerges requires a clear understanding of how harmful conversations unfold. However, most proactive moderation strategies are manual, requiring significant time and effort from community maintainers. To support more scalable approaches, we curate a dataset of 159 derailed toxic threads and 207 non-toxic threads from GitHub discussions. Our analysis reveals that toxicity can be forecast by tension triggers, sentiment shifts, and specific conversational patterns. We present a novel Large Language Model (LLM)-based framework for predicting conversational derailment on GitHub using a two-step prompting pipeline. First, we generate \textit{Summaries of Conversation Dynamics} (SCDs) via Least-to-Most (LtM) prompting; then we use these summaries to estimate the \textit{likelihood of derailment}. Evaluated on Qwen and Llama models, our LtM strategy achieves F1-scores of 0.901 and 0.852, respectively, at a decision threshold of 0.3, outperforming established NLP baselines on conversation derailment. External validation on a dataset of 308 GitHub issue threads (65 toxic, 243 non-toxic) yields an F1-score up to 0.797. Our findings demonstrate the effectiveness of structured LLM prompting for early detection of conversational derailment in OSS, enabling proactive and explainable moderation.
Similar Papers
Understanding and Predicting Derailment in Toxic Conversations on GitHub
Software Engineering
Stops online arguments before they get mean.
Forecasting Communication Derailments Through Conversation Generation
Computation and Language
Predicts arguments before they happen.
Combating Toxic Language: A Review of LLM-Based Strategies for Software Engineering
Machine Learning (CS)
Cleans up harmful words in computer code.