Score: 1

Overcoming Multi-step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner

Published: June 2, 2025 | arXiv ID: 2506.01301v1

By: Chunhui Zhang , Zhongyu Ouyang , Kwonjoon Lee and more

Potential Business Impact:

Helps computers understand what people are thinking.

Business Areas:

Semantic Search Internet Services

Theory-of-Mind (ToM) enables humans to infer mental states-such as beliefs, desires, and intentions-forming the foundation of social cognition. However, existing computational ToM methods rely on structured workflows with ToM-specific priors or deep model fine-tuning, which struggle with scalability in multimodal environments and fail to generalize as task complexity increases. To address these limitations, we propose a scalable Bayesian ToM planner that decomposes ToM reasoning into stepwise Bayesian updates. Our framework introduces weak-to-strong control, allowing smaller language models (LMs) to specialize in ToM-specific likelihood estimation and transfer their reasoning behaviors to larger LMs (7B to 405B) for integration with social and world knowledge. This synergistic approach aligns large-model inference of human mental states with Bayesian principles. Extensive experiments show that our method achieves a 4.6% accuracy improvement over state-of-the-art techniques on multimodal ToM benchmarks, including challenging unseen scenarios, thereby establishing a new standard for modeling human mental states in complex environments.

From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models

Artificial Intelligence

Helps AI understand what others think and feel.

17 Jun 2025 0

92%

Theory of Mind in Large Language Models: Assessment and Enhancement

Computation and Language

Helps computers understand what people are thinking.

26 Apr 2025 0

92%

Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?

Computation and Language

Computers can guess what others think, but maybe not really.

2 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

23 pages

Overcoming Multi-step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner

Helps computers understand what people are thinking.

Technical Abstract

From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models

Theory of Mind in Large Language Models: Assessment and Enhancement

Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?