Score: 0

MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts

Published: November 26, 2025 | arXiv ID: 2511.21089v1

By: Ivan Novikov

Potential Business Impact:

Makes AI smarter and faster without retraining.

Business Areas:

Multi-level Marketing Sales and Marketing

Large Language Models (LLMs) are predominantly deployed as dense transformers, where every parameter in every feed-forward block is activated for every token. While architecturally simple, this is computationally inefficient, since inference costs scale linearly with parameter count. Recent upcycling methods such as MoEfication, CMoE, ToMoE, and MoORE reveal that much of the useful computation lives in sparse, semi-modular substructures inside dense feed-forward networks, but these approaches typically rely on clustering, activation profiling, singular value decomposition, or custom routing that requires calibration data. This paper introduces MLPMoE (MLP Mixture-of-Experts), a training-free, deterministic transformation that restructures the dense MLP in transformer blocks into a static, high-cardinality mixture of experts. The transformation uses simple tensor slicing and summation, reinterpreting the algebra of tensor parallelism as a topological conversion rather than a distributed training pattern. We further introduce Fractal Fade (differential branch sparsity) and Compensated Pruning (variance-preserving branch reduction) as lightweight mechanisms for structured sparsity. On Qwen2.5-0.5B-Instruct and DeepSeek-R1-Distill-Llama-8B, the zero-shot MLPMoE transform changes a proxy perplexity metric by less than 0.05 percent while keeping the parameter count effectively constant. On the 8B model, differential sparsity removes about 20 percent of MLP parameters while keeping perplexity within about 2 percent of the dense baseline. The method operates entirely post hoc on existing checkpoints and does not require gradients, calibration sets, or router training. Code is available at https://gist.github.com/iwallarm/fc2ef1eddf226ca7814f9e5e2ae9bad1

Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression

Computation and Language

Makes AI smarter, faster, and use less memory.

27 Sep 2025 0

91%

DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction

Machine Learning (CS)

Makes smart computer programs run faster and better.

25 Aug 2025 0

91%

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Machine Learning (CS)

Makes AI better at thinking, not just remembering.

26 Aug 2025 1

View PDF Login to Bookmark

Page Count

4 pages

MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts

Makes AI smarter and faster without retraining.

Technical Abstract

Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression

DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks