Score: 1

TACTFL: Temporal Contrastive Training for Multi-modal Federated Learning with Similarity-guided Model Aggregation

Published: September 22, 2025 | arXiv ID: 2509.17532v1

By: Guanxiong Sun , Majid Mirmehdi , Zahraa Abdallah and more

Potential Business Impact:

Helps computers learn from mixed, unlabeled data.

Business Areas:
Text Analytics Data and Analytics, Software

Real-world federated learning faces two key challenges: limited access to labelled data and the presence of heterogeneous multi-modal inputs. This paper proposes TACTFL, a unified framework for semi-supervised multi-modal federated learning. TACTFL introduces a modality-agnostic temporal contrastive training scheme that conducts representation learning from unlabelled client data by leveraging temporal alignment across modalities. However, as clients perform self-supervised training on heterogeneous data, local models may diverge semantically. To mitigate this, TACTFL incorporates a similarity-guided model aggregation strategy that dynamically weights client models based on their representational consistency, promoting global alignment. Extensive experiments across diverse benchmarks and modalities, including video, audio, and wearable sensors, demonstrate that TACTFL achieves state-of-the-art performance. For instance, on the UCF101 dataset with only 10% labelled data, TACTFL attains 68.48% top-1 accuracy, significantly outperforming the FedOpt baseline of 35.35%. Code will be released upon publication.

Country of Origin
🇬🇧 United Kingdom

Page Count
15 pages

Category
Computer Science:
Distributed, Parallel, and Cluster Computing