Score: 0

HOGraspFlow: Exploring Vision-based Generative Grasp Synthesis with Hand-Object Priors and Taxonomy Awareness

Published: September 21, 2025 | arXiv ID: 2509.16871v1

By: Yitian Shi , Zicheng Guo , Rosa Wolf and more

Potential Business Impact:

Robots learn to grab anything by watching humans.

Business Areas:

Image Recognition Data and Analytics, Software

We propose Hand-Object\emph{(HO)GraspFlow}, an affordance-centric approach that retargets a single RGB with hand-object interaction (HOI) into multi-modal executable parallel jaw grasps without explicit geometric priors on target objects. Building on foundation models for hand reconstruction and vision, we synthesize $SE(3)$ grasp poses with denoising flow matching (FM), conditioned on the following three complementary cues: RGB foundation features as visual semantics, HOI contact reconstruction, and taxonomy-aware prior on grasp types. Our approach demonstrates high fidelity in grasp synthesis without explicit HOI contact input or object geometry, while maintaining strong contact and taxonomy recognition. Another controlled comparison shows that \emph{HOGraspFlow} consistently outperforms diffusion-based variants (\emph{HOGraspDiff}), achieving high distributional fidelity and more stable optimization in $SE(3)$. We demonstrate a reliable, object-agnostic grasp synthesis from human demonstrations in real-world experiments, where an average success rate of over $83\%$ is achieved.

TOUCH: Text-guided Controllable Generation of Free-Form Hand-Object Interactions

CV and Pattern Recognition

Lets robots do many hand actions, not just grab.

16 Oct 2025 0

89%

Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors

Graphics

Creates realistic 3D actions from text descriptions.

25 Mar 2025 0

89%

MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

CV and Pattern Recognition

Shows hidden parts of objects in videos.

7 Aug 2025 1

View PDF Login to Bookmark

Page Count

9 pages

HOGraspFlow: Exploring Vision-based Generative Grasp Synthesis with Hand-Object Priors and Taxonomy Awareness

Robots learn to grab anything by watching humans.

Technical Abstract

TOUCH: Text-guided Controllable Generation of Free-Form Hand-Object Interactions

Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors

MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips