Foundation Model Driven Robotics: A Comprehensive Review
By: Muhammad Tayyab Khan, Ammar Waheed
Potential Business Impact:
Robots understand and do tasks better with smart AI.
The rapid emergence of foundation models, particularly Large Language Models (LLMs) and Vision-Language Models (VLMs), has introduced a transformative paradigm in robotics. These models offer powerful capabilities in semantic understanding, high-level reasoning, and cross-modal generalization, enabling significant advances in perception, planning, control, and human-robot interaction. This critical review provides a structured synthesis of recent developments, categorizing applications across simulation-driven design, open-world execution, sim-to-real transfer, and adaptable robotics. Unlike existing surveys that emphasize isolated capabilities, this work highlights integrated, system-level strategies and evaluates their practical feasibility in real-world environments. Key enabling trends such as procedural scene generation, policy generalization, and multimodal reasoning are discussed alongside core bottlenecks, including limited embodiment, lack of multimodal data, safety risks, and computational constraints. Through this lens, this paper identifies both the architectural strengths and critical limitations of foundation model-based robotics, highlighting open challenges in real-time operation, grounding, resilience, and trust. The review concludes with a roadmap for future research aimed at bridging semantic reasoning and physical intelligence through more robust, interpretable, and embodied models.
Similar Papers
Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review
Robotics
Robots learn to do tasks by watching and listening.
From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems
Robotics
Teaches robots to follow instructions and move.
Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction
Robotics
Robots understand and do what you say.