Score: 0

Towards Efficient VLMs: Information-Theoretic Driven Compression via Adaptive Structural Pruning

Published: November 24, 2025 | arXiv ID: 2511.19518v1

By: Zhaoqi Xu , Yingying Zhang , Jian Li and more

Potential Business Impact:

Makes AI models smaller and faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Recent advances in vision-language models (VLMs) have shown remarkable performance across multimodal tasks, yet their ever-growing scale poses severe challenges for deployment and efficiency. Existing compression methods often rely on heuristic importance metrics or empirical pruning rules, lacking theoretical guarantees about information preservation. In this work, we propose InfoPrune, an information-theoretic framework for adaptive structural compression of VLMs. Grounded in the Information Bottleneck principle, we formulate pruning as a trade-off between retaining task-relevant semantics and discarding redundant dependencies. To quantify the contribution of each attention head, we introduce an entropy-based effective rank (eRank) and employ the Kolmogorov--Smirnov (KS) distance to measure the divergence between original and compressed structures. This yields a unified criterion that jointly considers structural sparsity and informational efficiency. Building on this foundation, we further design two complementary schemes: (1) a training-based head pruning guided by the proposed information loss objective, and (2) a training-free FFN compression via adaptive low-rank approximation. Extensive experiments on VQAv2, TextVQA, and GQA demonstrate that InfoPrune achieves up to 3.2x FLOP reduction and 1.8x acceleration with negligible performance degradation, establishing a theoretically grounded and practically effective step toward efficient multimodal large models.

All You Need Are Random Visual Tokens? Demystifying Token Pruning in VLLMs

CV and Pattern Recognition

Makes AI see images faster by ignoring useless parts.

8 Dec 2025 2

90%

A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models

CV and Pattern Recognition

Cuts 93% image junk to keep AI sharp

3 Aug 2025 1

90%

VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm

CV and Pattern Recognition

Makes AI understand pictures faster on phones.

2 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

13 pages

Towards Efficient VLMs: Information-Theoretic Driven Compression via Adaptive Structural Pruning

Makes AI models smaller and faster.

Technical Abstract

All You Need Are Random Visual Tokens? Demystifying Token Pruning in VLLMs

A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models

VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm