Score: 1

Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers

Published: July 31, 2025 | arXiv ID: 2507.23362v1

By: Ji Ma , Wei Suo , Peng Wang and more

Potential Business Impact:

Makes AI understand pictures and words faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Although large vision-language models (LVLMs) have demonstrated impressive capabilities in multi-modal understanding and reasoning, their practical applications are still limited by massive model parameters and high computational costs. Recent efforts from natural language processing (NLP) have shown the effectiveness of layer pruning, offering a plausible training-free compression solution. However, due to the modality divergence between vision and language, it is unclear whether these NLP techniques are still effective in LVLMs. In this paper, we empirically prove that directly applying these layer pruning methods to LVLMs is ineffective. Through extensive experiments, we find that non-essential vision-language (VL) tokens and inter-layer feature gaps pose critical challenges to pruning layers in LVLMs. Based on these insights, we propose a novel framework Short-LVLM (SVL) that can utilize important VL tokens and mitigate the layer-wise feature gaps. Notably, Short-LVLM not only achieves a superior trade-off between performance and efficiency but also exhibits several potential advantages, i.e., training-free, model-agnostic, and highly compatible. The code for this work is publicly available at https://github.com/ASGO-MM/Short-LVLM.

LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models

Computation and Language

Makes smart AI see and think faster.

23 Jan 2025 0

91%

LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit

CV and Pattern Recognition

Makes AI understand pictures and words better, faster.

13 Aug 2025 1

91%

Towards Lossless Ultimate Vision Token Compression for VLMs

CV and Pattern Recognition

Makes AI understand pictures much faster.

9 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

10 pages

Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers

Makes AI understand pictures and words faster.

Technical Abstract

LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models

LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit

Towards Lossless Ultimate Vision Token Compression for VLMs