Score: 0

Map2Thought: Explicit 3D Spatial Reasoning via Metric Cognitive Maps

Published: January 16, 2026 | arXiv ID: 2601.11442v1

By: Xiangjun Gao , Zhensong Zhang , Dave Zhenyu Chen and more

Potential Business Impact:

Helps robots understand and navigate 3D spaces.

Business Areas:
Mapping Services Navigation and Mapping

We propose Map2Thought, a framework that enables explicit and interpretable spatial reasoning for 3D VLMs. The framework is grounded in two key components: Metric Cognitive Map (Metric-CogMap) and Cognitive Chain-of-Thought (Cog-CoT). Metric-CogMap provides a unified spatial representation by integrating a discrete grid for relational reasoning with a continuous, metric-scale representation for precise geometric understanding. Building upon the Metric-CogMap, Cog-CoT performs explicit geometric reasoning through deterministic operations, including vector operations, bounding-box distances, and occlusion-aware appearance order cues, producing interpretable inference traces grounded in 3D structure. Experimental results show that Map2Thought enables explainable 3D understanding, achieving 59.9% accuracy using only half the supervision, closely matching the 60.9% baseline trained with the full dataset. It consistently outperforms state-of-the-art methods by 5.3%, 4.8%, and 4.0% under 10%, 25%, and 50% training subsets, respectively, on the VSI-Bench.

Page Count
21 pages

Category
Computer Science:
CV and Pattern Recognition