Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data
By: Ekaterina Borisova , Fabio Barth , Nils Feldhus and more
Potential Business Impact:
Computers understand data in tables better.
Tables are among the most widely used tools for representing structured data in research, business, medicine, and education. Although LLMs demonstrate strong performance in downstream tasks, their efficiency in processing tabular data remains underexplored. In this paper, we investigate the effectiveness of both text-based and multimodal LLMs on table understanding tasks through a cross-domain and cross-modality evaluation. Specifically, we compare their performance on tables from scientific vs. non-scientific contexts and examine their robustness on tables represented as images vs. text. Additionally, we conduct an interpretability analysis to measure context usage and input relevance. We also introduce the TableEval benchmark, comprising 3017 tables from scholarly publications, Wikipedia, and financial reports, where each table is provided in five different formats: Image, Dictionary, HTML, XML, and LaTeX. Our findings indicate that while LLMs maintain robustness across table modalities, they face significant challenges when processing scientific tables.
Similar Papers
Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning
Computation and Language
Helps computers understand science tables better.
Format Matters: The Robustness of Multimodal LLMs in Reviewing Evidence from Tables and Charts
Computation and Language
Helps computers check science facts from charts.
Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables
Artificial Intelligence
Helps computers understand chemistry tables better.