HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
By: Zihan Fang , Zheng Lin , Senkang Hu and more
Potential Business Impact:
Trains smart AI on phones without sharing private data.
While federated learning (FL) enables fine-tuning of large language models (LLMs) without compromising data privacy, the substantial size of an LLM renders on-device training impractical for resource-constrained clients, such as mobile devices. Thus, Mixture-of-Experts (MoE) models have emerged as a computation-efficient solution, which activates only a sparse subset of experts during model training to reduce computing burden without sacrificing performance. Though integrating MoE into FL fine-tuning holds significant potential, it still encounters three key challenges: i) selecting appropriate experts for clients remains challenging due to the lack of a reliable metric to measure each expert's impact on local fine-tuning performance, ii) the heterogeneous computing resources across clients severely hinder MoE-based LLM fine-tuning, as dynamic expert activations across diverse input samples can overwhelm resource-constrained devices, and iii) client-specific expert subsets and routing preference undermine global aggregation, where misaligned expert updates and inconsistent gating networks in troduce destructive interference. To address these challenges, we propose HFedMoE, a heterogeneous MoE-based FL fine-tuning framework that customizes a subset of experts to each client for computation-efficient LLM fine-tuning. Specifically, HFedMoE identifies the expert importance based on its contributions to fine-tuning performance, and then adaptively selects a subset of experts from an information bottleneck perspective to align with each client' s computing budget. A sparsity-aware model aggregation strategy is also designed to aggregate the actively fine-tuned experts and gating parameters with importance weighted contributions. Extensive experiments demonstrate that HFedMoE outperforms state-of-the-art benchmarks in training accuracy and convergence speed.
Similar Papers
FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment
Machine Learning (CS)
Helps AI learn better on phones with less data.
Federated Fine-Tuning of Sparsely-Activated Large Language Models on Resource-Constrained Devices
Distributed, Parallel, and Cluster Computing
Makes smart computer brains learn faster on weak computers.
FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge
Machine Learning (CS)
Teaches AI to learn from many computers without sharing secrets.