Looking into the Unknown: Exploring Action Discovery for Segmentation of Known and Unknown Actions
By: Federico Spurio , Emad Bahrami , Olga Zatsarynna and more
Potential Business Impact:
Finds hidden actions in videos using known ones.
We introduce Action Discovery, a novel setup within Temporal Action Segmentation that addresses the challenge of defining and annotating ambiguous actions and incomplete annotations in partially labeled datasets. In this setup, only a subset of actions - referred to as known actions - is annotated in the training data, while other unknown actions remain unlabeled. This scenario is particularly relevant in domains like neuroscience, where well-defined behaviors (e.g., walking, eating) coexist with subtle or infrequent actions that are often overlooked, as well as in applications where datasets are inherently partially annotated due to ambiguous or missing labels. To address this problem, we propose a two-step approach that leverages the known annotations to guide both the temporal and semantic granularity of unknown action segments. First, we introduce the Granularity-Guided Segmentation Module (GGSM), which identifies temporal intervals for both known and unknown actions by mimicking the granularity of annotated actions. Second, we propose the Unknown Action Segment Assignment (UASA), which identifies semantically meaningful classes within the unknown actions, based on learned embedding similarities. We systematically explore the proposed setting of Action Discovery on three challenging datasets - Breakfast, 50Salads, and Desktop Assembly - demonstrating that our method considerably improves upon existing baselines.
Similar Papers
From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings
CV and Pattern Recognition
Teaches robots to do jobs by watching videos.
Towards Generalizing Temporal Action Segmentation to Unseen Views
CV and Pattern Recognition
Helps videos understand actions from new angles.
UTAL-GNN: Unsupervised Temporal Action Localization using Graph Neural Networks
CV and Pattern Recognition
Finds exact moments of actions in videos.