Score: 1

Using Vision Language Models as Closed-Loop Symbolic Planners for Robotic Applications: A Control-Theoretic Perspective

Published: November 10, 2025 | arXiv ID: 2511.07410v1

By: Hao Wang , Sathwik Karnik , Bea Lim and more

BigTech Affiliations: Stanford University

Potential Business Impact:

Helps robots plan actions by seeing and thinking.

Business Areas:

Robotics Hardware, Science and Engineering, Software

Large Language Models (LLMs) and Vision Language Models (VLMs) have been widely used for embodied symbolic planning. Yet, how to effectively use these models for closed-loop symbolic planning remains largely unexplored. Because they operate as black boxes, LLMs and VLMs can produce unpredictable or costly errors, making their use in high-level robotic planning especially challenging. In this work, we investigate how to use VLMs as closed-loop symbolic planners for robotic applications from a control-theoretic perspective. Concretely, we study how the control horizon and warm-starting impact the performance of VLM symbolic planners. We design and conduct controlled experiments to gain insights that are broadly applicable to utilizing VLMs as closed-loop symbolic planners, and we discuss recommendations that can help improve the performance of VLM symbolic planners.

ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models

Robotics

Robots learn to explore and do tasks better.

16 Aug 2025 0

91%

VLM-driven Behavior Tree for Context-aware Task Planning

Robotics

Robots understand what they see to act.

7 Jan 2025 1

90%

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

Robotics

Robots learn to do tasks by watching and listening.

18 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

23 pages

Using Vision Language Models as Closed-Loop Symbolic Planners for Robotic Applications: A Control-Theoretic Perspective

Helps robots plan actions by seeing and thinking.

Technical Abstract

ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models

VLM-driven Behavior Tree for Context-aware Task Planning

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey