Score: 1

Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models

Published: August 2, 2025 | arXiv ID: 2508.01236v1

By: Mingyu Fu , Wei Suo , Ji Ma and more

Potential Business Impact:

Makes smart AI understand pictures faster, cheaper.

Despite the great success of Large Vision Language Models (LVLMs), their high computational cost severely limits their broad applications. The computational cost of LVLMs mainly stems from the visual sequence of the input, which consists of hundreds or even thousands of tokens. Although existing methods have made progress by removing redundant tokens, they suffer from severe performance degradation with high pruning rates due to the loss of visual information. In this paper, we propose an Adaptive Content Compensation Method (ACCM), which can effectively mitigate the visual information loss via an image caption. Specifically, ACCM comprises two key components: a lightweight caption model and a selector. Firstly the caption model generates question-related descriptions under the guidance of the user instruction. Then the selector further identifies a contextually appropriate caption from multiple candidates. Leveraging self-supervised learning, our modules could be learned efficiently without any human or automated labeling. We conduct extensive experiments across seven benchmarks and the results show that ACCM significantly outperforms existing methods with lower FLOPs (e.g., surpassing SOTA by 20.6% with 6.5% fewer FLOPs).

CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models

CV and Pattern Recognition

Makes AI understand pictures faster and cheaper.

24 Aug 2025 1

90%

CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models

CV and Pattern Recognition

Makes AI understand pictures faster and cheaper.

24 Aug 2025 1

90%

Towards Lossless Ultimate Vision Token Compression for VLMs

CV and Pattern Recognition

Makes AI understand pictures much faster.

9 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

10 pages

Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models

Makes smart AI understand pictures faster, cheaper.

Technical Abstract

CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models

CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models

Towards Lossless Ultimate Vision Token Compression for VLMs