Toward Accurate Long-Horizon Robotic Manipulation: Language-to-Action with Foundation Models via Scene Graphs
By: Sushil Samuel Dinesh, Shinkyu Park
Potential Business Impact:
Robots learn new tasks without special training.
This paper presents a framework that leverages pre-trained foundation models for robotic manipulation without domain-specific training. The framework integrates off-the-shelf models, combining multimodal perception from foundation models with a general-purpose reasoning model capable of robust task sequencing. Scene graphs, dynamically maintained within the framework, provide spatial awareness and enable consistent reasoning about the environment. The framework is evaluated through a series of tabletop robotic manipulation experiments, and the results highlight its potential for building robotic manipulation systems directly on top of off-the-shelf foundation models.
Similar Papers
Leveraging Foundation Models for Enhancing Robot Perception and Action
Robotics
Robots learn to do more things in messy places.
Language-Guided Long Horizon Manipulation with LLM-based Planning and Visual Perception
Robotics
Robots learn to fold clothes from instructions.
MORE: Mobile Manipulation Rearrangement Through Grounded Language Reasoning
Robotics
Robots can now move many objects in big places.