Bridging the Gap: In-Context Learning for Modeling Human Disagreement
By: Benedetta Muscato , Yue Li , Gizem Gezici and more
Potential Business Impact:
Helps computers understand when people disagree.
Large Language Models (LLMs) have shown strong performance on NLP classification tasks. However, they typically rely on aggregated labels-often via majority voting-which can obscure the human disagreement inherent in subjective annotations. This study examines whether LLMs can capture multiple perspectives and reflect annotator disagreement in subjective tasks such as hate speech and offensive language detection. We use in-context learning (ICL) in zero-shot and few-shot settings, evaluating four open-source LLMs across three label modeling strategies: aggregated hard labels, and disaggregated hard and soft labels. In few-shot prompting, we assess demonstration selection methods based on textual similarity (BM25, PLM-based), annotation disagreement (entropy), a combined ranking, and example ordering strategies (random vs. curriculum-based). Results show that multi-perspective generation is viable in zero-shot settings, while few-shot setups often fail to capture the full spectrum of human judgments. Prompt design and demonstration selection notably affect performance, though example ordering has limited impact. These findings highlight the challenges of modeling subjectivity with LLMs and the importance of building more perspective-aware, socially intelligent models.
Similar Papers
Opt-ICL at LeWiDi-2025: Maximizing In-Context Signal from Rater Examples via Meta-Learning
Computation and Language
Teaches computers to understand when people disagree.
Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels
Computation and Language
AI learns by sticking to its original understanding.
In-Context Bias Propagation in LLM-Based Tabular Data Generation
Machine Learning (CS)
AI can accidentally create unfair data.