Learning Activity View-invariance Under Extreme Viewpoint Changes via Curriculum Knowledge Distillation
By: Arjun Somayazulu , Efi Mavroudi , Changan Chen and more
Potential Business Impact:
Teaches computers to understand videos from any angle.
Traditional methods for view-invariant learning from video rely on controlled multi-view settings with minimal scene clutter. However, they struggle with in-the-wild videos that exhibit extreme viewpoint differences and share little visual content. We introduce a method for learning rich video representations in the presence of such severe view-occlusions. We first define a geometry-based metric that ranks views at a fine-grained temporal scale by their likely occlusion level. Then, using those rankings, we formulate a knowledge distillation objective that preserves action-centric semantics with a novel curriculum learning procedure that pairs incrementally more challenging views over time, thereby allowing smooth adaptation to extreme viewpoint differences. We evaluate our approach on two tasks, outperforming SOTA models on both temporal keystep grounding and fine-grained keystep recognition benchmarks - particularly on views that exhibit severe occlusion.
Similar Papers
View-aware Cross-modal Distillation for Multi-view Action Recognition
CV and Pattern Recognition
Lets computers understand actions from different camera angles.
Robust Cross-View Geo-Localization via Content-Viewpoint Disentanglement
CV and Pattern Recognition
Find places on Earth using different pictures.
Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity
CV and Pattern Recognition
Makes small computer brains learn better from big ones.