Score: 0

Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models

Published: May 27, 2025 | arXiv ID: 2505.21237v1

By: Zhaoqing Li , Haoning Xu , Xurong Xie and more

Potential Business Impact:

Makes speech recognition models smaller, faster.

Business Areas:

Presentations Software

This paper presents a novel memory-efficient model compression approach for Conformer ASR and speech foundation systems. Our approach features a unique "small-to-large" design. A compact "seed" model containing a few Conformer or Transformer blocks is trained and unfolded many times to emulate the performance of larger uncompressed models with different logical depths. The seed model and many unfolded paths are jointly trained within a single unfolding cycle. The KL-divergence between the largest unfolded and smallest seed models is used in a self-distillation process to minimize their performance disparity. Experimental results show that our foldable model produces ASR performance comparable to individually constructed Conformer and wav2vec2/HuBERT speech foundation models under various depth configurations, while requiring only minimal memory and storage. Conformer and wav2vec2 models with a reduction of 35% and 30% parameters are obtained without loss of performance, respectively.

Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates

Sound

Makes voice AI models smaller and faster.

28 May 2025 0

88%

Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision

Sound

Makes speech recognition smaller, faster, and cheaper.

27 May 2025 0

87%

Pretrained Conformers for Audio Fingerprinting and Retrieval

Sound

Finds sounds in noisy recordings quickly.

15 Aug 2025 1

View PDF Login to Bookmark

Page Count

5 pages

Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models

Makes speech recognition models smaller, faster.

Technical Abstract

Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates

Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision

Pretrained Conformers for Audio Fingerprinting and Retrieval