Horseshoe Mixtures-of-Experts (HS-MoE)
By: Nick Polson, Vadim Sokolov
Horseshoe mixtures-of-experts (HS-MoE) models provide a Bayesian framework for sparse expert selection in mixture-of-experts architectures. We combine the horseshoe prior's adaptive global-local shrinkage with input-dependent gating, yielding data-adaptive sparsity in expert usage. Our primary methodological contribution is a particle learning algorithm for sequential inference, in which the filter is propagated forward in time while tracking only sufficient statistics. We also discuss how HS-MoE relates to modern mixture-of-experts layers in large language models, which are deployed under extreme sparsity constraints (e.g., activating a small number of experts per token out of a large pool).
Similar Papers
Bayesian Mixture of Experts For Large Language Models
Machine Learning (CS)
Helps AI know when it's unsure about answers.
HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
Machine Learning (CS)
Trains smart AI on phones without sharing private data.
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Machine Learning (CS)
Makes smart computer programs use less power.