Score: 0

From Rows to Reasoning: A Retrieval-Augmented Multimodal Framework for Spreadsheet Understanding

Published: January 13, 2026 | arXiv ID: 2601.08741v1

By: Anmol Gulati , Sahil Sen , Waqar Sarguroh and more

Large Language Models (LLMs) struggle to reason over large-scale enterprise spreadsheets containing thousands of numeric rows, multiple linked sheets, and embedded visual content such as charts and receipts. Prior state-of-the-art spreadsheet reasoning approaches typically rely on single-sheet compression or full-context encoding, which limits scalability and fails to reflect how real users interact with complex, multimodal workbooks. We introduce FRTR-Bench, the first large-scale benchmark for multimodal spreadsheet reasoning, comprising 30 enterprise-grade Excel workbooks spanning nearly four million cells and more than 50 embedded images. To address these challenges, we present From Rows to Reasoning (FRTR), an advanced, multimodal retrieval-augmented generation framework that decomposes Excel workbooks into granular row, column, and block embeddings, employs hybrid lexical-dense retrieval with Reciprocal Rank Fusion (RRF), and integrates multimodal embeddings to reason over both numerical and visual information. We tested FRTR on six LLMs, achieving 74% answer accuracy on FRTR-Bench with Claude Sonnet 4.5, a substantial improvement over prior state-of-the-art approaches that reached only 24%. On the SpreadsheetLLM benchmark, FRTR achieved 87% accuracy with GPT-5 while reducing token usage by roughly 50% compared to context-compression methods.

FinMMDocR: Benchmarking Financial Multimodal Reasoning with Scenario Awareness, Document Understanding, and Multi-Step Computation

CV and Pattern Recognition

Teaches computers to understand money documents.

31 Dec 2025 0

88%

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

CV and Pattern Recognition

Tests computers on money math with pictures.

6 Aug 2025 0

88%

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Computation and Language

Makes AI better at thinking with pictures and words.

10 Mar 2025 1

View PDF Login to Bookmark

From Rows to Reasoning: A Retrieval-Augmented Multimodal Framework for Spreadsheet Understanding

Technical Abstract

FinMMDocR: Benchmarking Financial Multimodal Reasoning with Scenario Awareness, Document Understanding, and Multi-Step Computation

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL