ECHO: Ego-Centric modeling of Human-Object interactions
By: Ilya A. Petrov , Vladimir Guzov , Riccardo Marin and more
Potential Business Impact:
Tracks what you're doing with your hands.
Modeling human-object interactions (HOI) from an egocentric perspective is a largely unexplored yet important problem due to the increasing adoption of wearable devices, such as smart glasses and watches. We investigate how much information about interaction can be recovered from only head and wrists tracking. Our answer is ECHO (Ego-Centric modeling of Human-Object interactions), which, for the first time, proposes a unified framework to recover three modalities: human pose, object motion, and contact from such minimal observation. ECHO employs a Diffusion Transformer architecture and a unique three-variate diffusion process, which jointly models human motion, object trajectory, and contact sequence, allowing for flexible input configurations. Our method operates in a head-centric canonical space, enhancing robustness to global orientation. We propose a conveyor-based inference, which progressively increases the diffusion timestamp with the frame position, allowing us to process sequences of any length. Through extensive evaluation, we demonstrate that ECHO outperforms existing methods that do not offer the same flexibility, setting a state-of-the-art in egocentric HOI reconstruction.
Similar Papers
Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views
CV and Pattern Recognition
Finds exact moments hands touch objects.
Egocentric Human-Object Interaction Detection: A New Benchmark and Method
CV and Pattern Recognition
Helps robots see what hands are doing.
CoopDiff: Anticipating 3D Human-object Interactions via Contact-consistent Decoupled Diffusion
CV and Pattern Recognition
Predicts how people and things will move together.