OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs
By: Yuting Gao , Weihao Chen , Lan Wang and more
Potential Business Impact:
Teaches AI to judge its own answers better.
Preference learning has recently emerged as a pivotal strategy for post-training alignment of Multimodal Large Language Models (MLLMs). However, existing approaches predominantly rely on external human-annotated preference data, which is costly and labor-intensive to collect. In this work, we propose OrdMoE, a novel preference alignment framework that bypasses the reliance on external human preferences entirely by leveraging intrinsic signals within Mixture-of-Experts (MoE) architectures. Specifically, we observe that the router's expert selection scores implicitly encode a quality-aware ranking of responses (i.e. higher-scoring experts consistently generate higher-quality outputs). Building on this insight, OrdMoE constructs an internal preference hierarchy by grouping experts into ranked tiers based on their per-token routing scores and activating each tier separately to produce a sequence of responses with increasing quality. This yields a zero-cost, self-supervised preference ordering over generated responses, which can be directly optimized using standard preference learning objectives. Extensive experiments across multiple multimodal benchmarks demnstrate that OrdMoE significantly enhances both alignment and overall performance of multimodal Mixture-of-Experts LLMs, achieving competitive results without requiring any human-annotated preference data.
Similar Papers
Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression
Computation and Language
Makes AI smarter, faster, and use less memory.
Bayesian Mixture of Experts For Large Language Models
Machine Learning (CS)
Helps AI know when it's unsure about answers.
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
Machine Learning (CS)
Smartly uses computer power for better understanding.