Score: 0

Frugal Federated Learning for Violence Detection: A Comparison of LoRA-Tuned VLMs and Personalized CNNs

Published: October 20, 2025 | arXiv ID: 2510.17651v1

By: Sébastien Thuau , Siba Haidar , Ayush Bajracharya and more

Potential Business Impact:

Helps cameras spot danger using less power.

Business Areas:

Image Recognition Data and Analytics, Software

We examine frugal federated learning approaches to violence detection by comparing two complementary strategies: (i) zero-shot and federated fine-tuning of vision-language models (VLMs), and (ii) personalized training of a compact 3D convolutional neural network (CNN3D). Using LLaVA-7B and a 65.8M parameter CNN3D as representative cases, we evaluate accuracy, calibration, and energy usage under realistic non-IID settings. Both approaches exceed 90% accuracy. CNN3D slightly outperforms Low-Rank Adaptation(LoRA)-tuned VLMs in ROC AUC and log loss, while using less energy. VLMs remain favorable for contextual reasoning and multimodal inference. We quantify energy and CO$_2$ emissions across training and inference, and analyze sustainability trade-offs for deployment. To our knowledge, this is the first comparative study of LoRA-tuned vision-language models and personalized CNNs for federated violence detection, with an emphasis on energy efficiency and environmental metrics. These findings support a hybrid model: lightweight CNNs for routine classification, with selective VLM activation for complex or descriptive scenarios. The resulting framework offers a reproducible baseline for responsible, resource-aware AI in video surveillance, with extensions toward real-time, multimodal, and lifecycle-aware systems.

Federated Learning for Video Violence Detection: Complementary Roles of Lightweight CNNs and Vision-Language Models for Energy-Efficient Use

CV and Pattern Recognition

Makes cameras detect violence using less power.

10 Nov 2025 0

90%

Architectural Co-Design for Zero-Shot Anomaly Detection: Decoupling Representation and Dynamically Fusing Features in CLIP

CV and Pattern Recognition

Finds hidden problems in pictures using words.

11 Aug 2025 0

89%

Towards Minimal Fine-Tuning of VLMs

CV and Pattern Recognition

Makes AI understand pictures and text better, faster.

22 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇫🇷 France

Page Count

7 pages

Frugal Federated Learning for Violence Detection: A Comparison of LoRA-Tuned VLMs and Personalized CNNs

Helps cameras spot danger using less power.

Technical Abstract

Federated Learning for Video Violence Detection: Complementary Roles of Lightweight CNNs and Vision-Language Models for Energy-Efficient Use

Architectural Co-Design for Zero-Shot Anomaly Detection: Decoupling Representation and Dynamically Fusing Features in CLIP

Towards Minimal Fine-Tuning of VLMs