Score: 1

Towards Consistent Detection of Cognitive Distortions: LLM-Based Annotation and Dataset-Agnostic Evaluation

Published: November 3, 2025 | arXiv ID: 2511.01482v1

By: Neha Sharma, Navneet Agarwal, Kairit Sirts

Potential Business Impact:

Computers learn to spot bad thoughts better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Text-based automated Cognitive Distortion detection is a challenging task due to its subjective nature, with low agreement scores observed even among expert human annotators, leading to unreliable annotations. We explore the use of Large Language Models (LLMs) as consistent and reliable annotators, and propose that multiple independent LLM runs can reveal stable labeling patterns despite the inherent subjectivity of the task. Furthermore, to fairly compare models trained on datasets with different characteristics, we introduce a dataset-agnostic evaluation framework using Cohen's kappa as an effect size measure. This methodology allows for fair cross-dataset and cross-study comparisons where traditional metrics like F1 score fall short. Our results show that GPT-4 can produce consistent annotations (Fleiss's Kappa = 0.78), resulting in improved test set performance for models trained on these annotations compared to those trained on human-labeled data. Our findings suggest that LLMs can offer a scalable and internally consistent alternative for generating training data that supports strong downstream performance in subjective NLP tasks.

Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection

Computation and Language

Helps computers judge online hate speech better.

10 Dec 2025 1

90%

Evaluating Large Language Models Against Human Annotators in Latent Content Analysis: Sentiment, Political Leaning, Emotional Intensity, and Sarcasm

Computation and Language

Computers understand feelings and opinions in text.

5 Jan 2025 2

89%

Catching UX Flaws in Code: Leveraging LLMs to Identify Usability Flaws at the Development Stage

Software Engineering

Computers check websites for problems faster.

3 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇪🇪 Estonia

Repos / Data Links

huggingface.co

Page Count

16 pages

Towards Consistent Detection of Cognitive Distortions: LLM-Based Annotation and Dataset-Agnostic Evaluation

Computers learn to spot bad thoughts better.

Technical Abstract

Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection

Evaluating Large Language Models Against Human Annotators in Latent Content Analysis: Sentiment, Political Leaning, Emotional Intensity, and Sarcasm

Catching UX Flaws in Code: Leveraging LLMs to Identify Usability Flaws at the Development Stage