Score: 1

FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge

Published: August 26, 2025 | arXiv ID: 2508.18663v1

By: Gang Hu , Yinglei Teng , Pengfei Wu and more

Potential Business Impact:

Teaches AI to learn from many computers without sharing secrets.

Business Areas:

A/B Testing Data and Analytics

As FMs drive progress toward Artificial General Intelligence (AGI), fine-tuning them under privacy and resource constraints has become increasingly critical particularly when highquality training data resides on distributed edge devices. Federated Learning (FL) offers a compelling solution through Federated Fine-Tuning (FFT), which enables collaborative model adaptation without sharing raw data. Recent approaches incorporate Parameter-Efficient Fine-Tuning (PEFT) techniques such as Low Rank Adaptation (LoRA) to reduce computational overhead. However, LoRA-based FFT faces two major limitations in heterogeneous FL environments: structural incompatibility across clients with varying LoRA configurations and limited adaptability to non-IID data distributions, which hinders convergence and generalization. To address these challenges, we propose FFT MoE, a novel FFT framework that replaces LoRA with sparse Mixture of Experts (MoE) adapters. Each client trains a lightweight gating network to selectively activate a personalized subset of experts, enabling fine-grained adaptation to local resource budgets while preserving aggregation compatibility. To further combat the expert load imbalance caused by device and data heterogeneity, we introduce a heterogeneity-aware auxiliary loss that dynamically regularizes the routing distribution to ensure expert diversity and balanced utilization. Extensive experiments spanning both IID and non-IID conditions demonstrate that FFT MoE consistently outperforms state of the art FFT baselines in generalization performance and training efficiency.

Federated Fine-Tuning of Sparsely-Activated Large Language Models on Resource-Constrained Devices

Distributed, Parallel, and Cluster Computing

Makes smart computer brains learn faster on weak computers.

26 Aug 2025 0

91%

TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts

Machine Learning (CS)

Makes AI learn many tasks using less computer power.

29 Apr 2025 0

90%

FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts

Machine Learning (CS)

Makes AI learn new things faster, using less power.

31 May 2025 0

View PDF Login to Bookmark

Page Count

9 pages

FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge

Teaches AI to learn from many computers without sharing secrets.

Technical Abstract

Federated Fine-Tuning of Sparsely-Activated Large Language Models on Resource-Constrained Devices

TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts

FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts