Score: 3

RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems

Published: November 27, 2025 | arXiv ID: 2511.22275v1

By: Mengfan Li, Xuanhua Shi, Yang Deng

Potential Business Impact:

Helps computers understand what people want and need.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language models are revolutionizing the conversational recommender systems through their impressive capabilities in instruction comprehension, reasoning, and human interaction. A core factor underlying effective recommendation dialogue is the ability to infer and reason about users' mental states (such as desire, intention, and belief), a cognitive capacity commonly referred to as Theory of Mind. Despite growing interest in evaluating ToM in LLMs, current benchmarks predominantly rely on synthetic narratives inspired by Sally-Anne test, which emphasize physical perception and fail to capture the complexity of mental state inference in realistic conversational settings. Moreover, existing benchmarks often overlook a critical component of human ToM: behavioral prediction, the ability to use inferred mental states to guide strategic decision-making and select appropriate conversational actions for future interactions. To better align LLM-based ToM evaluation with human-like social reasoning, we propose RecToM, a novel benchmark for evaluating ToM abilities in recommendation dialogues. RecToM focuses on two complementary dimensions: Cognitive Inference and Behavioral Prediction. The former focus on understanding what has been communicated by inferring the underlying mental states. The latter emphasizes what should be done next, evaluating whether LLMs can leverage these inferred mental states to predict, select, and assess appropriate dialogue strategies. Extensive experiments on state-of-the-art LLMs demonstrate that RecToM poses a significant challenge. While the models exhibit partial competence in recognizing mental states, they struggle to maintain coherent, strategic ToM reasoning throughout dynamic recommendation dialogues, particularly in tracking evolving intentions and aligning conversational strategies with inferred mental states.

Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?

Computation and Language

Computers can guess what others think, but maybe not really.

2 Apr 2025 1

91%

Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection

Computation and Language

Lets computers understand what people think and feel.

26 Jan 2025 0

91%

Theory of Mind in Large Language Models: Assessment and Enhancement

Computation and Language

Helps computers understand what people are thinking.

26 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇸🇬 🇨🇳 China, Singapore

Repos / Data Links

github.com

Page Count

14 pages

RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems

Helps computers understand what people want and need.

Technical Abstract

Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?

Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection

Theory of Mind in Large Language Models: Assessment and Enhancement