Score: 1

RL-Driven Data Generation for Robust Vision-Based Dexterous Grasping

Published: April 25, 2025 | arXiv ID: 2504.18084v1

By: Atsushi Kanehira , Naoki Wake , Kazuhiro Sasabuchi and more

BigTech Affiliations: Microsoft

Potential Business Impact:

Teaches robots to grab many different objects.

Business Areas:

Image Recognition Data and Analytics, Software

This work presents reinforcement learning (RL)-driven data augmentation to improve the generalization of vision-action (VA) models for dexterous grasping. While real-to-sim-to-real frameworks, where a few real demonstrations seed large-scale simulated data, have proven effective for VA models, applying them to dexterous settings remains challenging: obtaining stable multi-finger contacts is nontrivial across diverse object shapes. To address this, we leverage RL to generate contact-rich grasping data across varied geometries. In line with the real-to-sim-to-real paradigm, the grasp skill is formulated as a parameterized and tunable reference trajectory refined by a residual policy learned via RL. This modular design enables trajectory-level control that is both consistent with real demonstrations and adaptable to diverse object geometries. A vision-conditioned policy trained on simulation-augmented data demonstrates strong generalization to unseen objects, highlighting the potential of our approach to alleviate the data bottleneck in training VA models.