Using Vision Language Models as Closed-Loop Symbolic Planners for Robotic Applications: A Control-Theoretic Perspective
By: Hao Wang , Sathwik Karnik , Bea Lim and more
Potential Business Impact:
Helps robots plan actions by seeing and thinking.
Large Language Models (LLMs) and Vision Language Models (VLMs) have been widely used for embodied symbolic planning. Yet, how to effectively use these models for closed-loop symbolic planning remains largely unexplored. Because they operate as black boxes, LLMs and VLMs can produce unpredictable or costly errors, making their use in high-level robotic planning especially challenging. In this work, we investigate how to use VLMs as closed-loop symbolic planners for robotic applications from a control-theoretic perspective. Concretely, we study how the control horizon and warm-starting impact the performance of VLM symbolic planners. We design and conduct controlled experiments to gain insights that are broadly applicable to utilizing VLMs as closed-loop symbolic planners, and we discuss recommendations that can help improve the performance of VLM symbolic planners.
Similar Papers
ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models
Robotics
Robots learn to explore and do tasks better.
VLM-driven Behavior Tree for Context-aware Task Planning
Robotics
Robots understand what they see to act.
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
Robotics
Robots learn to do tasks by watching and listening.