MaskSem: Semantic-Guided Masking for Learning 3D Hybrid High-Order Motion Representation
By: Wei Wei , Shaojie Zhang , Yonghao Dang and more
Potential Business Impact:
Helps robots understand human movements better.
Human action recognition is a crucial task for intelligent robotics, particularly within the context of human-robot collaboration research. In self-supervised skeleton-based action recognition, the mask-based reconstruction paradigm learns the spatial structure and motion patterns of the skeleton by masking joints and reconstructing the target from unlabeled data. However, existing methods focus on a limited set of joints and low-order motion patterns, limiting the model's ability to understand complex motion patterns. To address this issue, we introduce MaskSem, a novel semantic-guided masking method for learning 3D hybrid high-order motion representations. This novel framework leverages Grad-CAM based on relative motion to guide the masking of joints, which can be represented as the most semantically rich temporal orgions. The semantic-guided masking process can encourage the model to explore more discriminative features. Furthermore, we propose using hybrid high-order motion as the reconstruction target, enabling the model to learn multi-order motion patterns. Specifically, low-order motion velocity and high-order motion acceleration are used together as the reconstruction target. This approach offers a more comprehensive description of the dynamic motion process, enhancing the model's understanding of motion patterns. Experiments on the NTU60, NTU120, and PKU-MMD datasets show that MaskSem, combined with a vanilla transformer, improves skeleton-based action recognition, making it more suitable for applications in human-robot interaction.
Similar Papers
Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples
CV and Pattern Recognition
Teaches computers to recognize actions with less data.
Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition
CV and Pattern Recognition
Teaches robots to understand actions by watching.
Heterogeneous Skeleton-Based Action Representation Learning
CV and Pattern Recognition
Helps robots understand actions from different body shapes.