Score: 1

Adaptive Federated Distillation for Multi-Domain Non-IID Textual Data

Published: August 28, 2025 | arXiv ID: 2508.20557v1

By: Jiahao Xiao, Jiangming Liu

Potential Business Impact:

Helps AI learn from many different kinds of text.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The widespread success of pre-trained language models has established a new training paradigm, where a global PLM is fine-tuned using task-specific data from local clients. The local data are highly different from each other and can not capture the global distribution of the whole data in real world. To address the challenges of non-IID data in real environments, privacy-preserving federated distillation has been proposed and highly investigated. However, previous experimental non-IID scenarios are primarily identified with the label (output) diversity, without considering the diversity of language domains (input) that is crucial in natural language processing. In this paper, we introduce a comprehensive set of multi-domain non-IID scenarios and propose a unified benchmarking framework that includes diverse data. The benchmark can be used to evaluate the federated learning framework in a real environment. To this end, we propose an Adaptive Federated Distillation (AdaFD) framework designed to address multi-domain non-IID challenges in both homogeneous and heterogeneous settings. Experimental results demonstrate that our models capture the diversity of local clients and achieve better performance compared to the existing works. The code for this paper is available at: https://github.com/jiahaoxiao1228/AdaFD.

Country of Origin
🇨🇳 China

Repos / Data Links

Page Count
9 pages

Category
Computer Science:
Computation and Language