Score: 0

10 Open Challenges Steering the Future of Vision-Language-Action Models

Published: November 8, 2025 | arXiv ID: 2511.05936v1

By: Soujanya Poria , Navonil Majumder , Chia-Yu Hung and more

Potential Business Impact:

Robots learn to follow spoken commands and act.

Business Areas:

Autonomous Vehicles Transportation

Due to their ability of follow natural language instructions, vision-language-action (VLA) models are increasingly prevalent in the embodied AI arena, following the widespread success of their precursors -- LLMs and VLMs. In this paper, we discuss 10 principal milestones in the ongoing development of VLA models -- multimodality, reasoning, data, evaluation, cross-robot action generalization, efficiency, whole-body coordination, safety, agents, and coordination with humans. Furthermore, we discuss the emerging trends of using spatial understanding, modeling world dynamics, post training, and data synthesis -- all aiming to reach these milestones. Through these discussions, we hope to bring attention to the research avenues that may accelerate the development of VLA models into wider acceptability.

An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

Robotics

Helps robots understand and do what you tell them.

12 Dec 2025 1

96%

An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

Robotics

Helps robots understand and do what you tell them.

12 Dec 2025 1

96%

An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

Robotics

Helps robots understand and do what you say.

12 Dec 2025 1

View PDF Login to Bookmark

Page Count

11 pages

10 Open Challenges Steering the Future of Vision-Language-Action Models

Robots learn to follow spoken commands and act.

Technical Abstract

An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges