Learning Semantic Atomic Skills for Multi-Task Robotic Manipulation
By: Yihang Zhu , Weiqing Wang , Shijie Wu and more
While imitation learning has shown impressive results in single-task robot manipulation, scaling it to multi-task settings remains a fundamental challenge due to issues such as suboptimal demonstrations, trajectory noise, and behavioral multi-modality. Existing skill-based methods attempt to address this by decomposing actions into reusable abstractions, but they often rely on fixed-length segmentation or environmental priors that limit semantic consistency and cross-task generalization. In this work, we propose AtomSkill, a novel multi-task imitation learning framework that learns and leverages a structured Atomic Skill Space for composable robot manipulation. Our approach is built on two key technical contributions. First, we construct a Semantically Grounded Atomic Skill Library by partitioning demonstrations into variable-length skills using gripper-state keyframe detection and vision-language model annotation. A contrastive learning objective ensures the resulting skill embeddings are both semantically consistent and temporally coherent. Second, we propose an Action Generation module with Keypose Imagination, which jointly predicts a skill's long-horizon terminal keypose and its immediate action sequence. This enables the policy to reason about overarching motion goals and fine-grained control simultaneously, facilitating robust skill chaining. Extensive experiments in simulated and real-world environments show that AtomSkill consistently outperforms state-of-the-art methods across diverse manipulation tasks.
Similar Papers
An Atomic Skill Library Construction Method for Data-Efficient Embodied Manipulation
Robotics
Teaches robots new tricks with less practice.
Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills
Robotics
Robots learn complex tasks without human help.
Learning a Thousand Tasks in a Day
Robotics
Teaches robots new tasks with just one example.