Score: 0

TRec: Egocentric Action Recognition using 2D Point Tracks

Published: January 7, 2026 | arXiv ID: 2601.03667v1

By: Dennis Holzmann, Sven Wachsmuth

Potential Business Impact:

Tracks moving dots to understand what you're doing.

Business Areas:

Image Recognition Data and Analytics, Software

We present a novel approach for egocentric action recognition that leverages 2D point tracks as an additional motion cue. While most existing methods rely on RGB appearance, human pose estimation, or their combination, our work demonstrates that tracking randomly sampled image points across video frames can substantially improve recognition accuracy. Unlike prior approaches, we do not detect hands, objects, or interaction regions. Instead, we employ CoTracker to follow a set of randomly initialized points through each video and use the resulting trajectories, together with the corresponding image frames, as input to a Transformer-based recognition model. Surprisingly, our method achieves notable gains even when only the initial frame and its associated point tracks are provided, without incorporating the full video sequence. Experimental results confirm that integrating 2D point tracks consistently enhances performance compared to the same model trained without motion information, highlighting their potential as a lightweight yet effective representation for egocentric action understanding.