Score: 0

Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning

Published: September 3, 2025 | arXiv ID: 2509.10513v1

By: Sugyeong Eo , Jungjun Lee , Chanjun Park and more

Potential Business Impact:

Teaches computers to learn better from different tasks.

Business Areas:

Management Consulting Professional Services

A sparse Mixture-of-Experts (MoE) architecture has emerged as a highly scalable solution by conditionally activating sub-modules without a proportional increase in computational costs. However, improving expert specialization to enhance performance and generalization remains a challenge for MoE, especially in instruction tuning scenarios characterized by significant input heterogeneity. In this work, we propose the Mixture-of-Clustered-Experts (MoCE) to address this limitation through a dual-stage routing mechanism. The first stage in the mechanism performs expert group routing based on sequence-level features, while the second stage activates the top-$k$ experts within the group at the token level. This approach enables the effective partitioning of heterogeneous inputs based on their knowledge requirements, encouraging expert group specialization while maintaining the advantages of token-level routing. We evaluate MoCE across a comprehensive set of benchmarks, demonstrating its consistent superiority over strong baselines and its enhanced generalization capabilities. Detailed analysis further highlights the robustness and effectiveness of MoCE.

Mixture of Group Experts for Learning Invariant Representations

Machine Learning (CS)

Makes AI smarter by teaching experts to work together.

12 Apr 2025 0

92%

Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning

Machine Learning (CS)

Teaches computers to solve hard problems by splitting them.

2 Jun 2025 0

92%

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

Machine Learning (CS)

Makes smart computer programs use less power.

10 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇰🇷 Korea, Republic of

Page Count

12 pages

Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning

Teaches computers to learn better from different tasks.

Technical Abstract

Mixture of Group Experts for Learning Invariant Representations

Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications