Score: 0

OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs

Published: November 24, 2025 | arXiv ID: 2511.19023v1

By: Yuting Gao , Weihao Chen , Lan Wang and more

Potential Business Impact:

Teaches AI to judge its own answers better.

Business Areas:

MOOC Education, Software

Preference learning has recently emerged as a pivotal strategy for post-training alignment of Multimodal Large Language Models (MLLMs). However, existing approaches predominantly rely on external human-annotated preference data, which is costly and labor-intensive to collect. In this work, we propose OrdMoE, a novel preference alignment framework that bypasses the reliance on external human preferences entirely by leveraging intrinsic signals within Mixture-of-Experts (MoE) architectures. Specifically, we observe that the router's expert selection scores implicitly encode a quality-aware ranking of responses (i.e. higher-scoring experts consistently generate higher-quality outputs). Building on this insight, OrdMoE constructs an internal preference hierarchy by grouping experts into ranked tiers based on their per-token routing scores and activating each tier separately to produce a sequence of responses with increasing quality. This yields a zero-cost, self-supervised preference ordering over generated responses, which can be directly optimized using standard preference learning objectives. Extensive experiments across multiple multimodal benchmarks demnstrate that OrdMoE significantly enhances both alignment and overall performance of multimodal Mixture-of-Experts LLMs, achieving competitive results without requiring any human-annotated preference data.

Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression

Computation and Language

Makes AI smarter, faster, and use less memory.

27 Sep 2025 0

90%

Bayesian Mixture of Experts For Large Language Models

Machine Learning (CS)

Helps AI know when it's unsure about answers.

12 Nov 2025 2

90%

AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert

Machine Learning (CS)

Smartly uses computer power for better understanding.

23 Nov 2025 0

View PDF Login to Bookmark

Page Count

12 pages

OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs

Teaches AI to judge its own answers better.

Technical Abstract

Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression

Bayesian Mixture of Experts For Large Language Models

AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert