TRec: Egocentric Action Recognition using 2D Point Tracks
By: Dennis Holzmann, Sven Wachsmuth
Potential Business Impact:
Tracks moving dots to understand what you're doing.
We present a novel approach for egocentric action recognition that leverages 2D point tracks as an additional motion cue. While most existing methods rely on RGB appearance, human pose estimation, or their combination, our work demonstrates that tracking randomly sampled image points across video frames can substantially improve recognition accuracy. Unlike prior approaches, we do not detect hands, objects, or interaction regions. Instead, we employ CoTracker to follow a set of randomly initialized points through each video and use the resulting trajectories, together with the corresponding image frames, as input to a Transformer-based recognition model. Surprisingly, our method achieves notable gains even when only the initial frame and its associated point tracks are provided, without incorporating the full video sequence. Experimental results confirm that integrating 2D point tracks consistently enhances performance compared to the same model trained without motion information, highlighting their potential as a lightweight yet effective representation for egocentric action understanding.
Similar Papers
TRec: Egocentric Action Recognition using 2D Point Tracks
CV and Pattern Recognition
Tracks points to understand what you're doing.
Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos
CV and Pattern Recognition
Helps robots predict hand movements by watching.
Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition
CV and Pattern Recognition
Teaches robots to understand actions by watching.