Score: 2

VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

Published: December 16, 2025 | arXiv ID: 2512.14531v1

By: Ying Nie , Kai Han , Hongguang Li and more

BigTech Affiliations: Huawei

Potential Business Impact:

Makes AI smarter without needing more computer memory.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The rapid scaling of Large Language Models (LLMs) has achieved remarkable performance, but it also leads to prohibitive memory costs. Existing parameter-efficient approaches such as pruning and quantization mainly compress pretrained models without enhancing architectural capacity, thereby hitting the representational ceiling of the base model. In this work, we propose VersatileFFN, a novel feed-forward network (FFN) that enables flexible reuse of parameters in both width and depth dimensions within a fixed parameter budget. Inspired by the dual-process theory of cognition, VersatileFFN comprises two adaptive pathways: a width-versatile path that generates a mixture of sub-experts from a single shared FFN, mimicking sparse expert routing without increasing parameters, and a depth-versatile path that recursively applies the same FFN to emulate deeper processing for complex tokens. A difficulty-aware gating dynamically balances the two pathways, steering "easy" tokens through the efficient width-wise route and allocating deeper iterative refinement to "hard" tokens. Crucially, both pathways reuse the same parameters, so all additional capacity comes from computation rather than memory. Experiments across diverse benchmarks and model scales demonstrate the effectiveness of the method. The code will be available at https://github.com/huawei-noah/noah-research/tree/master/VersatileFFN.

Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?

Machine Learning (CS)

Makes AI smarter by using its brain better.

1 Oct 2025 0

87%

COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens

Computation and Language

Makes AI models smaller and faster to run.

8 Sep 2025 1

87%

Mixture-of-Channels: Exploiting Sparse FFNs for Efficient LLMs Pre-Training and Inference

Machine Learning (CS)

Makes AI models use less computer memory.

12 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

11 pages

VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

Makes AI smarter without needing more computer memory.

Technical Abstract

Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?

COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens

Mixture-of-Channels: Exploiting Sparse FFNs for Efficient LLMs Pre-Training and Inference