LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit
By: Chengtao Lv , Bilang Zhang , Yang Yong and more
Potential Business Impact:
Makes AI understand pictures and words better, faster.
Large Vision-Language Models (VLMs) exhibit impressive multi-modal capabilities but suffer from prohibitive computational and memory demands, due to their long visual token sequences and massive parameter sizes. To address these issues, recent works have proposed training-free compression methods. However, existing efforts often suffer from three major limitations: (1) Current approaches do not decompose techniques into comparable modules, hindering fair evaluation across spatial and temporal redundancy. (2) Evaluation confined to simple single-turn tasks, failing to reflect performance in realistic scenarios. (3) Isolated use of individual compression techniques, without exploring their joint potential. To overcome these gaps, we introduce LLMC+, a comprehensive VLM compression benchmark with a versatile, plug-and-play toolkit. LLMC+ supports over 20 algorithms across five representative VLM families and enables systematic study of token-level and model-level compression. Our benchmark reveals that: (1) Spatial and temporal redundancies demand distinct technical strategies. (2) Token reduction methods degrade significantly in multi-turn dialogue and detail-sensitive tasks. (3) Combining token and model compression achieves extreme compression with minimal performance loss. We believe LLMC+ will facilitate fair evaluation and inspire future research in efficient VLM. Our code is available at https://github.com/ModelTC/LightCompress.
Similar Papers
Learning Free Token Reduction for Multi-Modal Large Language Models
CV and Pattern Recognition
Makes AI understand videos faster and cheaper.
Towards Lossless Ultimate Vision Token Compression for VLMs
CV and Pattern Recognition
Makes AI understand pictures much faster.
Benchmarking and Enhancing VLM for Compressed Image Understanding
CV and Pattern Recognition
Helps computers understand blurry pictures better.