VLM-driven Skill Selection for Robotic Assembly Tasks
By: Jeong-Jung Kim, Doo-Yeol Koh, Chang-Hyun Kim
Potential Business Impact:
Robot builds things by watching and listening.
This paper presents a robotic assembly framework that combines Vision-Language Models (VLMs) with imitation learning for assembly manipulation tasks. Our system employs a gripper-equipped robot that moves in 3D space to perform assembly operations. The framework integrates visual perception, natural language understanding, and learned primitive skills to enable flexible and adaptive robotic manipulation. Experimental results demonstrate the effectiveness of our approach in assembly scenarios, achieving high success rates while maintaining interpretability through the structured primitive skill decomposition.
Similar Papers
Perceiving, Reasoning, Adapting: A Dual-Layer Framework for VLM-Guided Precision Robotic Manipulation
Robotics
Robots learn to do tricky jobs with speed and accuracy.
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
Robotics
Robots learn to do tasks by watching and listening.
Rethinking Intermediate Representation for VLM-based Robot Manipulation
Robotics
Helps robots understand and do new tasks.