Score: 0

Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs

Published: December 10, 2025 | arXiv ID: 2512.09874v1

By: Pius Horn, Janis Keuper

Correctly parsing mathematical formulas from PDFs is critical for training large language models and building scientific knowledge bases from academic literature, yet existing benchmarks either exclude formulas entirely or lack semantically-aware evaluation metrics. We introduce a novel benchmarking framework centered on synthetically generated PDFs with precise LaTeX ground truth, enabling systematic control over layout, formulas, and content characteristics. A key methodological contribution is pioneering LLM-as-a-judge for semantic formula assessment, combined with a robust two-stage matching pipeline that handles parser output inconsistencies. Through human validation on 250 formula pairs (750 ratings from 30 evaluators), we demonstrate that LLM-based evaluation achieves substantially higher correlation with human judgment (Pearson r=0.78) compared to CDM (r=0.34) and text similarity (r~0). Evaluating 20+ contemporary PDF parsers (including specialized OCR models, vision-language models, and rule-based approaches) across 100 synthetic documents with 2,000+ formulas reveals significant performance disparities. Our findings provide crucial insights for practitioners selecting parsers for downstream applications and establish a robust, scalable methodology that enables reproducible evaluation of PDF formula extraction quality. Code and benchmark data: https://github.com/phorn1/pdf-parse-bench

Logics-Parsing Technical Report

CV and Pattern Recognition

Reads messy documents, like newspapers, perfectly.

24 Sep 2025 2

88%

Benchmarking PDF Accessibility Evaluation A Dataset and Framework for Assessing Automated and LLM-Based Approaches for Accessibility Testing

Human-Computer Interaction

Helps make PDFs easier for blind people.

23 Sep 2025 1

87%

From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations

Computation and Language

Makes AI better at medical math for doctors.

20 Sep 2025 0

View PDF Login to Bookmark

Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs

Technical Abstract

Logics-Parsing Technical Report

Benchmarking PDF Accessibility Evaluation A Dataset and Framework for Assessing Automated and LLM-Based Approaches for Accessibility Testing

From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations