Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate
By: Mikel K. Ngueajio , Flor Miriam Plaza-del-Arco , Yi-Ling Chung and more
Potential Business Impact:
Makes online hate speech less hurtful.
Automated counter-narratives (CN) offer a promising strategy for mitigating online hate speech, yet concerns about their affective tone, accessibility, and ethical risks remain. We propose a framework for evaluating Large Language Model (LLM)-generated CNs across four dimensions: persona framing, verbosity and readability, affective tone, and ethical robustness. Using GPT-4o-Mini, Cohere's CommandR-7B, and Meta's LLaMA 3.1-70B, we assess three prompting strategies on the MT-Conan and HatEval datasets. Our findings reveal that LLM-generated CNs are often verbose and adapted for people with college-level literacy, limiting their accessibility. While emotionally guided prompts yield more empathetic and readable responses, there remain concerns surrounding safety and effectiveness.
Similar Papers
Algorithmic Fairness in NLP: Persona-Infused LLMs for Human-Centric Hate Speech Detection
Computation and Language
Makes AI better at spotting hate speech fairly.
A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation
Computation and Language
AI can create fake news tailored to you.
An Empirical Analysis of LLMs for Countering Misinformation
Computation and Language
Helps computers spot fake news, but needs improvement.