Score: 1

Efficient Speech Translation through Model Compression and Knowledge Distillation

Published: May 26, 2025 | arXiv ID: 2505.20237v2

By: Yasmin Moslem

Potential Business Impact:

Makes translation apps smaller and faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Efficient deployment of large audio-language models for speech translation remains challenging due to their significant computational requirements. In this paper, we address this challenge through our system submissions to the "Model Compression" track at the International Conference on Spoken Language Translation (IWSLT 2025). We experiment with a combination of approaches including iterative layer pruning based on layer importance evaluation, low-rank adaptation with 4-bit quantization (QLoRA), and knowledge distillation. In our experiments, we use Qwen2-Audio-7B-Instruct for speech translation into German and Chinese. Our pruned (student) models achieve up to a 50% reduction in both model parameters and storage footprint, while retaining 97-100% of the translation quality of the in-domain (teacher) models.

Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks

Computation and Language

Makes smart computer programs smaller and faster.

10 Jul 2025 1

90%

On Multilingual Encoder Language Model Compression for Low-Resource Languages

Computation and Language

Makes computer language programs much smaller.

22 May 2025 0

90%

Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications

Information Retrieval

Makes small AI models as smart as big ones.

20 Feb 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com github.com

Page Count

10 pages

Efficient Speech Translation through Model Compression and Knowledge Distillation

Makes translation apps smaller and faster.

Technical Abstract

Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks

On Multilingual Encoder Language Model Compression for Low-Resource Languages

Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications