Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos
By: Mingfei Chen , Yifan Wang , Zhengqin Li and more
Potential Business Impact:
Helps robots predict hand movements by watching.
Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning and action. To address these, we first present the EgoMAN dataset, a large-scale egocentric dataset for interaction stage-aware 3D hand trajectory prediction with 219K 6DoF trajectories and 3M structured QA pairs for semantic, spatial, and motion reasoning. We then introduce the EgoMAN model, a reasoning-to-motion framework that links vision-language reasoning and motion generation via a trajectory-token interface. Trained progressively to align reasoning with motion dynamics, our approach yields accurate and stage-aware trajectories with generalization across real-world scenes.
Similar Papers
Ego-centric Predictive Model Conditioned on Hand Trajectories
CV and Pattern Recognition
Predicts actions and what happens next.
Ego-centric Predictive Model Conditioned on Hand Trajectories
CV and Pattern Recognition
Predicts what you'll do and what happens next.
Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views
CV and Pattern Recognition
Finds exact moments hands touch objects.