Pose-Aware Weakly-Supervised Action Segmentation
By: Seth Z. Zhao , Reza Ghoddoosian , Isht Dwivedi and more
Potential Business Impact:
Teaches computers to understand actions in videos.
Understanding human behavior is an important problem in the pursuit of visual intelligence. A challenge in this endeavor is the extensive and costly effort required to accurately label action segments. To address this issue, we consider learning methods that demand minimal supervision for segmentation of human actions in long instructional videos. Specifically, we introduce a weakly-supervised framework that uniquely incorporates pose knowledge during training while omitting its use during inference, thereby distilling pose knowledge pertinent to each action component. We propose a pose-inspired contrastive loss as a part of the whole weakly-supervised framework which is trained to distinguish action boundaries more effectively. Our approach, validated through extensive experiments on representative datasets, outperforms previous state-of-the-art (SOTA) in segmenting long instructional videos under both online and offline settings. Additionally, we demonstrate the framework's adaptability to various segmentation backbones and pose extractors across different datasets.
Similar Papers
Cost-Sensitive Learning for Long-Tailed Temporal Action Segmentation
CV and Pattern Recognition
Helps videos understand actions, even rare ones.
Synthetic Human Action Video Data Generation with Pose Transfer
CV and Pattern Recognition
Makes fake videos of people move realistically.
SpatioTemporal Learning for Human Pose Estimation in Sparsely-Labeled Videos
CV and Pattern Recognition
Teaches computers to track people in videos better.