Score: 1

Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer

Published: April 21, 2025 | arXiv ID: 2504.14860v1

By: Ziyi Liu, Yangcen Liu

Potential Business Impact:

Teaches computers to find actions in videos.

Business Areas:

A/B Testing Data and Analytics

Weakly-supervised Temporal Action Localization (WTAL) has achieved notable success but still suffers from a lack of temporal annotations, leading to a performance and framework gap compared with fully-supervised methods. While recent approaches employ pseudo labels for training, three key challenges: generating high-quality pseudo labels, making full use of different priors, and optimizing training methods with noisy labels remain unresolved. Due to these perspectives, we propose PseudoFormer, a novel two-branch framework that bridges the gap between weakly and fully-supervised Temporal Action Localization (TAL). We first introduce RickerFusion, which maps all predicted action proposals to a global shared space to generate pseudo labels with better quality. Subsequently, we leverage both snippet-level and proposal-level labels with different priors from the weak branch to train the regression-based model in the full branch. Finally, the uncertainty mask and iterative refinement mechanism are applied for training with noisy pseudo labels. PseudoFormer achieves state-of-the-art WTAL results on the two commonly used benchmarks, THUMOS14 and ActivityNet1.3. Besides, extensive ablation studies demonstrate the contribution of each component of our method.

TBT-Former: Learning Temporal Boundary Distributions for Action Localization

CV and Pattern Recognition

Helps computers know exactly when actions start and end.

1 Dec 2025 0

86%

Multi-task Learning with Extended Temporal Shift Module for Temporal Action Localization

CV and Pattern Recognition

Helps videos find and name actions happening.

12 Dec 2025 0

86%

Weakly Supervised Multimodal Temporal Forgery Localization via Multitask Learning

CV and Pattern Recognition

Finds fake videos even with little clues.

4 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

10 pages

Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer

Teaches computers to find actions in videos.

Technical Abstract

TBT-Former: Learning Temporal Boundary Distributions for Action Localization

Multi-task Learning with Extended Temporal Shift Module for Temporal Action Localization

Weakly Supervised Multimodal Temporal Forgery Localization via Multitask Learning