Score: 2

DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM

Published: December 11, 2025 | arXiv ID: 2512.10619v1

By: Qintong Zhang , Junyuan Zhang , Zhifei Ren and more

Potential Business Impact:

Finds errors in scanned documents better than others.

Business Areas:

Image Recognition Data and Analytics, Software

Document parsing aims to transform unstructured PDF images into semi-structured data, facilitating the digitization and utilization of information in diverse domains. While vision language models (VLMs) have significantly advanced this task, achieving reliable, high-quality parsing in real-world scenarios remains challenging. Common practice often selects the top-performing model on standard benchmarks. However, these benchmarks may carry dataset-specific biases, leading to inconsistent model rankings and limited correlation with real-world performance. Moreover, benchmark metrics typically provide only overall scores, which can obscure distinct error patterns in output. This raises a key challenge: how can we reliably and comprehensively assess document parsing quality in the wild? We address this problem with DOCR-Inspector, which formalizes document parsing assessment as fine-grained error detection and analysis. Leveraging VLM-as-a-Judge, DOCR-Inspector analyzes a document image and its parsed output, identifies all errors, assigns them to one of 28 predefined types, and produces a comprehensive quality assessment. To enable this capability, we construct DOCRcase-200K for training and propose the Chain-of-Checklist reasoning paradigm to enable the hierarchical structure of parsing quality assessment. For empirical validation, we introduce DOCRcaseBench, a set of 882 real-world document parsing cases with manual annotations. On this benchmark, DOCR-Inspector-7B outperforms commercial models like Gemini 2.5 Pro, as well as leading open-source models. Further experiments demonstrate that its quality assessments provide valuable guidance for parsing results refinement, making DOCR-Inspector both a practical evaluator and a driver for advancing document parsing systems at scale. Model and code are released at: https://github.com/ZZZZZQT/DOCR-Inspector.

Unblocking Fine-Grained Evaluation of Detailed Captions: An Explaining AutoRater and Critic-and-Revise Pipeline

Computation and Language

Checks if computer descriptions of pictures are true.

9 Jun 2025 1

88%

Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs

CV and Pattern Recognition

Lets computers understand math in papers.

10 Dec 2025 2

88%

DocRefine: An Intelligent Framework for Scientific Document Understanding and Content Optimization based on Multimodal Large Model Agents

CV and Pattern Recognition

Helps computers understand and change science papers.

9 Aug 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com github.com github.com huggingface.co

Page Count

39 pages

DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM

Finds errors in scanned documents better than others.

Technical Abstract

Unblocking Fine-Grained Evaluation of Detailed Captions: An Explaining AutoRater and Critic-and-Revise Pipeline

Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs

DocRefine: An Intelligent Framework for Scientific Document Understanding and Content Optimization based on Multimodal Large Model Agents