FedVideoMAE: Efficient Privacy-Preserving Federated Video Moderation
By: Ziyuan Tao , Chuanzhi Xu , Sandaru Jayawardana and more
The rapid growth of short-form video platforms increases the need for privacy-preserving moderation, as cloud-based pipelines expose raw videos to privacy risks, high bandwidth costs, and inference latency. To address these challenges, we propose an on-device federated learning framework for video violence detection that integrates self-supervised VideoMAE representations, LoRA-based parameter-efficient adaptation, and defense-in-depth privacy protection. Our approach reduces the trainable parameter count to 5.5M (~3.5% of a 156M backbone) and incorporates DP-SGD with configurable privacy budgets and secure aggregation. Experiments on RWF-2000 with 40 clients achieve 77.25% accuracy without privacy protection and 65-66% under strong differential privacy, while reducing communication cost by $28.3\times$ compared to full-model federated learning. The code is available at: {https://github.com/zyt-599/FedVideoMAE}
Similar Papers
Federated Learning for Video Violence Detection: Complementary Roles of Lightweight CNNs and Vision-Language Models for Energy-Efficient Use
CV and Pattern Recognition
Makes cameras detect violence using less power.
Exploring Personalized Federated Learning Architectures for Violence Detection in Surveillance Videos
CV and Pattern Recognition
Finds trouble in city cameras better.
Frugal Federated Learning for Violence Detection: A Comparison of LoRA-Tuned VLMs and Personalized CNNs
CV and Pattern Recognition
Helps cameras spot danger using less power.