OpenMap: Instruction Grounding via Open-Vocabulary Visual-Language Mapping
By: Danyang Li , Zenghui Yang , Guangpeng Qi and more
Potential Business Impact:
Lets robots follow spoken directions in real rooms
Grounding natural language instructions to visual observations is fundamental for embodied agents operating in open-world environments. Recent advances in visual-language mapping have enabled generalizable semantic representations by leveraging vision-language models (VLMs). However, these methods often fall short in aligning free-form language commands with specific scene instances, due to limitations in both instance-level semantic consistency and instruction interpretation. We present OpenMap, a zero-shot open-vocabulary visual-language map designed for accurate instruction grounding in navigation tasks. To address semantic inconsistencies across views, we introduce a Structural-Semantic Consensus constraint that jointly considers global geometric structure and vision-language similarity to guide robust 3D instance-level aggregation. To improve instruction interpretation, we propose an LLM-assisted Instruction-to-Instance Grounding module that enables fine-grained instance selection by incorporating spatial context and expressive target descriptions. We evaluate OpenMap on ScanNet200 and Matterport3D, covering both semantic mapping and instruction-to-target retrieval tasks. Experimental results show that OpenMap outperforms state-of-the-art baselines in zero-shot settings, demonstrating the effectiveness of our method in bridging free-form language and 3D perception for embodied navigation.
Similar Papers
Multimodal Spatial Language Maps for Robot Navigation and Manipulation
Robotics
Robots understand and go to places using words and senses.
DSM: Building A Diverse Semantic Map for 3D Visual Grounding
CV and Pattern Recognition
Helps robots understand and interact with their surroundings.
OmniMap: A General Mapping Framework Integrating Optics, Geometry, and Semantics
Robotics
Robots see and understand the world perfectly.