Score: 0

View-aware Cross-modal Distillation for Multi-view Action Recognition

Published: November 17, 2025 | arXiv ID: 2511.12870v1

By: Trung Thanh Nguyen , Yasutomo Kawanishi , Vijay John and more

Potential Business Impact:

Lets computers understand actions from different camera angles.

Business Areas:

Image Recognition Data and Analytics, Software

The widespread use of multi-sensor systems has increased research in multi-view action recognition. While existing approaches in multi-view setups with fully overlapping sensors benefit from consistent view coverage, partially overlapping settings where actions are visible in only a subset of views remain underexplored. This challenge becomes more severe in real-world scenarios, as many systems provide only limited input modalities and rely on sequence-level annotations instead of dense frame-level labels. In this study, we propose View-aware Cross-modal Knowledge Distillation (ViCoKD), a framework that distills knowledge from a fully supervised multi-modal teacher to a modality- and annotation-limited student. ViCoKD employs a cross-modal adapter with cross-modal attention, allowing the student to exploit multi-modal correlations while operating with incomplete modalities. Moreover, we propose a View-aware Consistency module to address view misalignment, where the same action may appear differently or only partially across viewpoints. It enforces prediction alignment when the action is co-visible across views, guided by human-detection masks and confidence-weighted Jensen-Shannon divergence between their predicted class distributions. Experiments on the real-world MultiSensor-Home dataset show that ViCoKD consistently outperforms competitive distillation methods across multiple backbones and environments, delivering significant gains and surpassing the teacher model under limited conditions.

Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models

CV and Pattern Recognition

Teaches small computers to see actions like big ones.

12 Nov 2025 1

90%

Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency

CV and Pattern Recognition

Teaches computers to learn from different kinds of pictures.

12 Nov 2025 1

89%

Cross-View Cross-Modal Unsupervised Domain Adaptation for Driver Monitoring System

CV and Pattern Recognition

Helps cars see if drivers are looking away.

15 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇯🇵 Japan

Page Count

14 pages

View-aware Cross-modal Distillation for Multi-view Action Recognition

Lets computers understand actions from different camera angles.

Technical Abstract

Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models

Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency

Cross-View Cross-Modal Unsupervised Domain Adaptation for Driver Monitoring System