Score: 0

Think, Remember, Navigate: Zero-Shot Object-Goal Navigation with VLM-Powered Reasoning

Published: November 12, 2025 | arXiv ID: 2511.08942v1

By: Mobin Habibpour, Fatemeh Afghah

Potential Business Impact:

Helps robots explore new places much faster.

Business Areas:

Autonomous Vehicles Transportation

While Vision-Language Models (VLMs) are set to transform robotic navigation, existing methods often underutilize their reasoning capabilities. To unlock the full potential of VLMs in robotics, we shift their role from passive observers to active strategists in the navigation process. Our framework outsources high-level planning to a VLM, which leverages its contextual understanding to guide a frontier-based exploration agent. This intelligent guidance is achieved through a trio of techniques: structured chain-of-thought prompting that elicits logical, step-by-step reasoning; dynamic inclusion of the agent's recent action history to prevent getting stuck in loops; and a novel capability that enables the VLM to interpret top-down obstacle maps alongside first-person views, thereby enhancing spatial awareness. When tested on challenging benchmarks like HM3D, Gibson, and MP3D, this method produces exceptionally direct and logical trajectories, marking a substantial improvement in navigation efficiency over existing approaches and charting a path toward more capable embodied agents.

ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

Robotics

Robots learn to explore homes by imagining where to go.

19 Dec 2025 1

93%

STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation

Robotics

Helps robots find objects in new places faster.

10 May 2025 1

93%

ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models

Robotics

Robots learn to explore and do tasks better.

16 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

15 pages

Think, Remember, Navigate: Zero-Shot Object-Goal Navigation with VLM-Powered Reasoning

Helps robots explore new places much faster.

Technical Abstract

ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation

ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models