LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game
By: Fangzhou Liang , Tianshi Zheng , Chunkit Chan and more
Potential Business Impact:
Helps AI understand partners to play games better.
Effective multi-agent collaboration requires agents to infer the rationale behind others' actions, a capability rooted in Theory-of-Mind (ToM). While recent Large Language Models (LLMs) excel at logical inference, their ability to infer rationale in dynamic, collaborative settings remains under-explored. This study introduces LLM-Hanabi, a novel benchmark that uses the cooperative game Hanabi to evaluate the rationale inference and ToM of LLMs. Our framework features an automated evaluation system that measures both game performance and ToM proficiency. Across a range of models, we find a significant positive correlation between ToM and in-game success. Notably, first-order ToM (interpreting others' intent) correlates more strongly with performance than second-order ToM (predicting others' interpretations). These findings highlight that for effective AI collaboration, the ability to accurately interpret a partner's rationale is more critical than higher-order reasoning. We conclude that prioritizing first-order ToM is a promising direction for enhancing the collaborative capabilities of future models.
Similar Papers
Towards Cognitive Synergy in LLM-Based Multi-Agent Systems: Integrating Theory of Mind and Critical Evaluation
Multiagent Systems
Makes AI teams think and work together better.
Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection
Computation and Language
Lets computers understand what people think and feel.
Theory of Mind Using Active Inference: A Framework for Multi-Agent Cooperation
Artificial Intelligence
Lets AI robots cooperate by guessing others' thoughts from actions