Score: 4

TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing

Published: November 17, 2025 | arXiv ID: 2511.13283v1

By: Jongha Kim , Minseong Bae , Sanghyeok Lee and more

BigTech Affiliations: Google

Potential Business Impact:

Helps computers understand charts and tables faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Table images present unique challenges for effective and efficient understanding due to the need for question-specific focus and the presence of redundant background regions. Existing Multimodal Large Language Model (MLLM) approaches often overlook these characteristics, resulting in uninformative and redundant visual representations. To address these issues, we aim to generate visual features that are both informative and compact to improve table understanding. We first propose progressive question conditioning, which injects the question into Vision Transformer layers with gradually increasing frequency, considering each layer's capacity to handle additional information, to generate question-aware visual features. To reduce redundancy, we introduce a pruning strategy that discards background tokens, thereby improving efficiency. To mitigate information loss from pruning, we further propose token focusing, a training strategy that encourages the model to concentrate essential information in the retained tokens. By combining these approaches, we present TabFlash, an efficient and effective MLLM for table understanding. TabFlash achieves state-of-the-art performance, outperforming both open-source and proprietary MLLMs, while requiring 27% less FLOPs and 30% less memory usage compared to the second-best MLLM.

TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition

CV and Pattern Recognition

Teaches computers to read tables without examples.

1 Dec 2025 2

88%

Table as a Modality for Large Language Models

Computation and Language

Helps computers understand charts and tables better.

30 Nov 2025 1

87%

Table Comprehension in Building Codes using Vision Language Models and Domain-Specific Fine-Tuning

Computation and Language

Helps computers understand building rules from pictures.

23 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇰🇷 🇺🇸 Korea, Republic of, United States

Repos / Data Links

github.com

Page Count

14 pages

TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing

Helps computers understand charts and tables faster.

Technical Abstract

TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition

Table as a Modality for Large Language Models

Table Comprehension in Building Codes using Vision Language Models and Domain-Specific Fine-Tuning