Towards Generalizing Temporal Action Segmentation to Unseen Views
By: Emad Bahrami , Olga Zatsarynna , Gianpiero Francesca and more
Potential Business Impact:
Helps videos understand actions from new angles.
While there has been substantial progress in temporal action segmentation, the challenge to generalize to unseen views remains unaddressed. Hence, we define a protocol for unseen view action segmentation where camera views for evaluating the model are unavailable during training. This includes changing from top-frontal views to a side view or even more challenging from exocentric to egocentric views. Furthermore, we present an approach for temporal action segmentation that tackles this challenge. Our approach leverages a shared representation at both the sequence and segment levels to reduce the impact of view differences during training. We achieve this by introducing a sequence loss and an action loss, which together facilitate consistent video and action representations across different views. The evaluation on the Assembly101, IkeaASM, and EgoExoLearn datasets demonstrate significant improvements, with a 12.8% increase in F1@50 for unseen exocentric views and a substantial 54% improvement for unseen egocentric views.
Similar Papers
Pose-Aware Weakly-Supervised Action Segmentation
CV and Pattern Recognition
Teaches computers to understand actions in videos.
Looking into the Unknown: Exploring Action Discovery for Segmentation of Known and Unknown Actions
CV and Pattern Recognition
Finds hidden actions in videos using known ones.
Unsupervised Ego- and Exo-centric Dense Procedural Activity Captioning via Gaze Consensus Adaptation
Multimedia
Helps computers understand actions from different views.