System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection
By: Binglin Wu, Jiaxiu Zou, Xianneng Li
Potential Business Impact:
Finds hidden hate speech online.
The proliferation of hate speech on Chinese social media poses urgent societal risks, yet traditional systems struggle to decode context-dependent rhetorical strategies and evolving slang. To bridge this gap, we propose a novel three-stage LLM-based framework: Prompt Engineering, Supervised Fine-tuning, and LLM Merging. First, context-aware prompts are designed to guide LLMs in extracting implicit hate patterns. Next, task-specific features are integrated during supervised fine-tuning to enhance domain adaptation. Finally, merging fine-tuned LLMs improves robustness against out-of-distribution cases. Evaluations on the STATE-ToxiCN benchmark validate the framework's effectiveness, demonstrating superior performance over baseline methods in detecting fine-grained hate speech.
Similar Papers
Leveraging LLMs for Context-Aware Implicit Textual and Multimodal Hate Speech Detection
Computation and Language
Helps computers spot hateful messages better.
Labels or Input? Rethinking Augmentation in Multimodal Hate Detection
CV and Pattern Recognition
Finds mean memes by looking at pictures and words.
Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study
Computation and Language
Finds hate speech in many languages.