OVGrasp: Open-Vocabulary Grasping Assistance via Multimodal Intent Detection
By: Chen Hu, Shan Luo, Letizia Gionfrida
Potential Business Impact:
Helps people with weak hands pick up anything.
Grasping assistance is essential for restoring autonomy in individuals with motor impairments, particularly in unstructured environments where object categories and user intentions are diverse and unpredictable. We present OVGrasp, a hierarchical control framework for soft exoskeleton-based grasp assistance that integrates RGB-D vision, open-vocabulary prompts, and voice commands to enable robust multimodal interaction. To enhance generalization in open environments, OVGrasp incorporates a vision-language foundation model with an open-vocabulary mechanism, allowing zero-shot detection of previously unseen objects without retraining. A multimodal decision-maker further fuses spatial and linguistic cues to infer user intent, such as grasp or release, in multi-object scenarios. We deploy the complete framework on a custom egocentric-view wearable exoskeleton and conduct systematic evaluations on 15 objects across three grasp types. Experimental results with ten participants demonstrate that OVGrasp achieves a grasping ability score (GAS) of 87.00%, outperforming state-of-the-art baselines and achieving improved kinematic alignment with natural hand motion.
Similar Papers
OVAL-Grasp: Open-Vocabulary Affordance Localization for Task Oriented Grasping
Robotics
Robots grasp objects correctly for any task.
OmniDexVLG: Learning Dexterous Grasp Generation from Vision Language Model-Guided Grasp Semantics, Taxonomy and Functional Affordance
Robotics
Lets robots pick up anything, any way.
Point Cloud-based Grasping for Soft Hand Exoskeleton
Robotics
Helps robots grasp objects better using sight.