Score: 1

Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models

Published: September 8, 2025 | arXiv ID: 2509.06415v1

By: Jaemin Son, Sujin Choi, Inyong Yun

Potential Business Impact:

Makes AI understand papers faster and cheaper.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Recent progress in vision-language models (VLMs) has led to impressive results in document understanding tasks, but their high computational demands remain a challenge. To mitigate the compute burdens, we propose a lightweight token pruning framework that filters out non-informative background regions from document images prior to VLM processing. A binary patch-level classifier removes non-text areas, and a max-pooling refinement step recovers fragmented text regions to enhance spatial coherence. Experiments on real-world document datasets demonstrate that our approach substantially lowers computational costs, while maintaining comparable accuracy.

PoRe: Position-Reweighted Visual Token Pruning for Vision Language Models

CV and Pattern Recognition

Helps AI focus on important parts of pictures.

25 Aug 2025 2

90%

Training-Free Pyramid Token Pruning for Efficient Large Vision-Language Models via Region, Token, and Instruction-Guided Importance

CV and Pattern Recognition

Focuses on important image parts for faster AI.

19 Sep 2025 0

90%

VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm

CV and Pattern Recognition

Makes AI understand pictures faster on phones.

2 Dec 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

5 pages

Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models

Makes AI understand papers faster and cheaper.

Technical Abstract

PoRe: Position-Reweighted Visual Token Pruning for Vision Language Models

Training-Free Pyramid Token Pruning for Efficient Large Vision-Language Models via Region, Token, and Instruction-Guided Importance

VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm