Score: 0

Towards Generalizing Temporal Action Segmentation to Unseen Views

Published: April 3, 2025 | arXiv ID: 2504.02512v1

By: Emad Bahrami , Olga Zatsarynna , Gianpiero Francesca and more

Potential Business Impact:

Helps videos understand actions from new angles.

Business Areas:
Motion Capture Media and Entertainment, Video

While there has been substantial progress in temporal action segmentation, the challenge to generalize to unseen views remains unaddressed. Hence, we define a protocol for unseen view action segmentation where camera views for evaluating the model are unavailable during training. This includes changing from top-frontal views to a side view or even more challenging from exocentric to egocentric views. Furthermore, we present an approach for temporal action segmentation that tackles this challenge. Our approach leverages a shared representation at both the sequence and segment levels to reduce the impact of view differences during training. We achieve this by introducing a sequence loss and an action loss, which together facilitate consistent video and action representations across different views. The evaluation on the Assembly101, IkeaASM, and EgoExoLearn datasets demonstrate significant improvements, with a 12.8% increase in F1@50 for unseen exocentric views and a substantial 54% improvement for unseen egocentric views.

Page Count
22 pages

Category
Computer Science:
CV and Pattern Recognition