Score: 0

LLMs and their Limited Theory of Mind: Evaluating Mental State Annotations in Situated Dialogue

Published: September 2, 2025 | arXiv ID: 2509.02292v1

By: Katharine Kowalyshyn, Matthias Scheutz

Potential Business Impact:

Helps teams spot misunderstandings in their talks.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

What if large language models could not only infer human mindsets but also expose every blind spot in team dialogue such as discrepancies in the team members' joint understanding? We present a novel, two-step framework that leverages large language models (LLMs) both as human-style annotators of team dialogues to track the team's shared mental models (SMMs) and as automated discrepancy detectors among individuals' mental states. In the first step, an LLM generates annotations by identifying SMM elements within task-oriented dialogues from the Cooperative Remote Search Task (CReST) corpus. Then, a secondary LLM compares these LLM-derived annotations and human annotations against gold-standard labels to detect and characterize divergences. We define an SMM coherence evaluation framework for this use case and apply it to six CReST dialogues, ultimately producing: (1) a dataset of human and LLM annotations; (2) a reproducible evaluation framework for SMM coherence; and (3) an empirical assessment of LLM-based discrepancy detection. Our results reveal that, although LLMs exhibit apparent coherence on straightforward natural-language annotation tasks, they systematically err in scenarios requiring spatial reasoning or disambiguation of prosodic cues.

Mitigating Semantic Drift: Evaluating LLMs' Efficacy in Psychotherapy through MI Dialogue Summarization

Computation and Language

Helps AI understand therapy conversations better.

28 Nov 2025 2

91%

Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

Computation and Language

Helps AI understand and work with people.

11 Jun 2025 0

90%

Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges

Artificial Intelligence

Lets computer characters act more like real people.

25 Jul 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

17 pages

LLMs and their Limited Theory of Mind: Evaluating Mental State Annotations in Situated Dialogue

Helps teams spot misunderstandings in their talks.

Technical Abstract

Mitigating Semantic Drift: Evaluating LLMs' Efficacy in Psychotherapy through MI Dialogue Summarization

Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges