Score: 0

Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module

Published: October 15, 2025 | arXiv ID: 2510.13558v1

By: Ruitao Feng , Bixi Zhang , Sheng Liang and more

Potential Business Impact:

Makes computers understand sounds like humans.

Business Areas:

Audio Media and Entertainment, Music and Audio

Aligning pretrained audio encoders and Large Language Models (LLMs) offers a promising, parameter-efficient path to building powerful multimodal agents. However, existing methods often require costly full-model finetuning or rely on static adapters that may lack expressive power. Drawing inspiration from the Platonic Representation Hypothesis, we introduce SteerMoE, a novel and modular framework for audio-language alignment. SteerMoE freezes both the audio encoder and the LLM decoder, training only a lightweight steering module integrated within the encoder's layers. This module uses a Mixture-of-Experts (MoE) router to dynamically select and apply learned steering vectors, progressively transforming continuous audio representations into a space comprehensible to the LLM. By operating entirely in the continuous embedding space, our approach requires no modifications to the LLM's vocabulary and preserves its advanced reasoning and agentic capabilities. We demonstrate through experiments on ASR, audio understanding, and a qualitative function-calling task that SteerMoE achieves strong performance while remaining highly modular and computationally efficient, offering a robust new paradigm for developing sophisticated audio-language systems.

Steering MoE LLMs via Expert (De)Activation

Computation and Language

Controls AI behavior without changing its brain.

11 Sep 2025 2

89%

Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression

Computation and Language

Makes AI smarter, faster, and use less memory.

27 Sep 2025 0

89%

OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs

Machine Learning (CS)

Teaches AI to judge its own answers better.

24 Nov 2025 0

View PDF Login to Bookmark

Page Count

5 pages

Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module

Makes computers understand sounds like humans.

Technical Abstract

Steering MoE LLMs via Expert (De)Activation

Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression

OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs