Score: 1

PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts

Published: May 13, 2025 | arXiv ID: 2505.08719v1

By: Yang Su , Na Yan , Yansha Deng and more

Potential Business Impact:

Keeps private info safe on phones, not the cloud.

Business Areas:

Wireless Hardware, Mobile

Large language models (LLMs) hosted on cloud servers alleviate the computational and storage burdens on local devices but raise privacy concerns due to sensitive data transmission and require substantial communication bandwidth, which is challenging in constrained environments. In contrast, small language models (SLMs) running locally enhance privacy but suffer from limited performance on complex tasks. To balance computational cost, performance, and privacy protection under bandwidth constraints, we propose a privacy-aware wireless collaborative mixture of experts (PWC-MoE) framework. Specifically, PWC-MoE employs a sparse privacy-aware gating network to dynamically route sensitive tokens to privacy experts located on local clients, while non-sensitive tokens are routed to non-privacy experts located at the remote base station. To achieve computational efficiency, the gating network ensures that each token is dynamically routed to and processed by only one expert. To enhance scalability and prevent overloading of specific experts, we introduce a group-wise load-balancing mechanism for the gating network that evenly distributes sensitive tokens among privacy experts and non-sensitive tokens among non-privacy experts. To adapt to bandwidth constraints while preserving model performance, we propose a bandwidth-adaptive and importance-aware token offloading scheme. This scheme incorporates an importance predictor to evaluate the importance scores of non-sensitive tokens, prioritizing the most important tokens for transmission to the base station based on their predicted importance and the available bandwidth. Experiments demonstrate that the PWC-MoE framework effectively preserves privacy and maintains high performance even in bandwidth-constrained environments, offering a practical solution for deploying LLMs in privacy-sensitive and bandwidth-limited scenarios.

PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs

Machine Learning (CS)

Trains big AI models together, privately.

3 Jun 2025 2

90%

Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey

Networking and Internet Architecture

Makes wireless networks smarter and faster.

28 Apr 2025 1

89%

Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression

Computation and Language

Makes AI smarter, faster, and use less memory.

27 Sep 2025 0

View PDF Login to Bookmark

Page Count

6 pages

PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts

Keeps private info safe on phones, not the cloud.

Technical Abstract

PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs

Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey

Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression