Score: 1

MHARFedLLM: Multimodal Human Activity Recognition Using Federated Large Language Model

Published: August 3, 2025 | arXiv ID: 2508.01701v1

By: Asmit Bandyopadhyay , Rohit Basu , Tanmay Sen and more

Potential Business Impact:

Helps computers understand what people are doing.

Human Activity Recognition (HAR) plays a vital role in applications such as fitness tracking, smart homes, and healthcare monitoring. Traditional HAR systems often rely on single modalities, such as motion sensors or cameras, limiting robustness and accuracy in real-world environments. This work presents FedTime-MAGNET, a novel multimodal federated learning framework that advances HAR by combining heterogeneous data sources: depth cameras, pressure mats, and accelerometers. At its core is the Multimodal Adaptive Graph Neural Expert Transformer (MAGNET), a fusion architecture that uses graph attention and a Mixture of Experts to generate unified, discriminative embeddings across modalities. To capture complex temporal dependencies, a lightweight T5 encoder only architecture is customized and adapted within this framework. Extensive experiments show that FedTime-MAGNET significantly improves HAR performance, achieving a centralized F1 Score of 0.934 and a strong federated F1 Score of 0.881. These results demonstrate the effectiveness of combining multimodal fusion, time series LLMs, and federated learning for building accurate and robust HAR systems.

Repos / Data Links

Page Count
9 pages

Category
Computer Science:
Machine Learning (CS)