Toward Accurate Long-Horizon Robotic Manipulation: Language-to-Action with Foundation Models via Scene Graphs
By: Sushil Samuel Dinesh, Shinkyu Park
Potential Business Impact:
Robots learn new tasks without special training.
This paper presents a framework that leverages pre-trained foundation models for robotic manipulation without domain-specific training. The framework integrates off-the-shelf models, combining multimodal perception from foundation models with a general-purpose reasoning model capable of robust task sequencing. Scene graphs, dynamically maintained within the framework, provide spatial awareness and enable consistent reasoning about the environment. The framework is evaluated through a series of tabletop robotic manipulation experiments, and the results highlight its potential for building robotic manipulation systems directly on top of off-the-shelf foundation models.
Similar Papers
Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives
Robotics
Robots learn to do tasks by watching and understanding.
Foundation Model Driven Robotics: A Comprehensive Review
Robotics
Robots understand and do tasks better with smart AI.
Leveraging Foundation Models for Enhancing Robot Perception and Action
Robotics
Robots learn to do more things in messy places.