Score: 1

Using Vision Language Models as Closed-Loop Symbolic Planners for Robotic Applications: A Control-Theoretic Perspective

Published: November 10, 2025 | arXiv ID: 2511.07410v1

By: Hao Wang , Sathwik Karnik , Bea Lim and more

BigTech Affiliations: Stanford University

Potential Business Impact:

Helps robots plan actions by seeing and thinking.

Business Areas:
Robotics Hardware, Science and Engineering, Software

Large Language Models (LLMs) and Vision Language Models (VLMs) have been widely used for embodied symbolic planning. Yet, how to effectively use these models for closed-loop symbolic planning remains largely unexplored. Because they operate as black boxes, LLMs and VLMs can produce unpredictable or costly errors, making their use in high-level robotic planning especially challenging. In this work, we investigate how to use VLMs as closed-loop symbolic planners for robotic applications from a control-theoretic perspective. Concretely, we study how the control horizon and warm-starting impact the performance of VLM symbolic planners. We design and conduct controlled experiments to gain insights that are broadly applicable to utilizing VLMs as closed-loop symbolic planners, and we discuss recommendations that can help improve the performance of VLM symbolic planners.

Country of Origin
🇺🇸 United States

Page Count
23 pages

Category
Computer Science:
Robotics