TACTFL: Temporal Contrastive Training for Multi-modal Federated Learning with Similarity-guided Model Aggregation
By: Guanxiong Sun , Majid Mirmehdi , Zahraa Abdallah and more
Potential Business Impact:
Helps computers learn from mixed, unlabeled data.
Real-world federated learning faces two key challenges: limited access to labelled data and the presence of heterogeneous multi-modal inputs. This paper proposes TACTFL, a unified framework for semi-supervised multi-modal federated learning. TACTFL introduces a modality-agnostic temporal contrastive training scheme that conducts representation learning from unlabelled client data by leveraging temporal alignment across modalities. However, as clients perform self-supervised training on heterogeneous data, local models may diverge semantically. To mitigate this, TACTFL incorporates a similarity-guided model aggregation strategy that dynamically weights client models based on their representational consistency, promoting global alignment. Extensive experiments across diverse benchmarks and modalities, including video, audio, and wearable sensors, demonstrate that TACTFL achieves state-of-the-art performance. For instance, on the UCF101 dataset with only 10% labelled data, TACTFL attains 68.48% top-1 accuracy, significantly outperforming the FedOpt baseline of 35.35%. Code will be released upon publication.
Similar Papers
Context-aware TFL: A Universal Context-aware Contrastive Learning Framework for Temporal Forgery Localization
CV and Pattern Recognition
Finds fake parts in videos.
Not All Clients Are Equal: Personalized Federated Learning on Heterogeneous Multi-Modal Clients
Machine Learning (CS)
AI learns from everyone without sharing private data.
BlendFL: Blended Federated Learning for Handling Multimodal Data Heterogeneity
Machine Learning (CS)
Helps computers learn from mixed data without sharing.