Score: 1

MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns

Published: November 13, 2025 | arXiv ID: 2511.10390v1

By: Jiarui Zhang , Yuliang Liu , Zijun Wu and more

Potential Business Impact:

Reads messy, complex documents perfectly.

Business Areas:

Image Recognition Data and Analytics, Software

Document parsing is a core task in document intelligence, supporting applications such as information extraction, retrieval-augmented generation, and automated document analysis. However, real-world documents often feature complex layouts with multi-level tables, embedded images or formulas, and cross-page structures, which remain challenging for existing OCR systems. We introduce MonkeyOCR v1.5, a unified vision-language framework that enhances both layout understanding and content recognition through a two-stage parsing pipeline. The first stage employs a large multimodal model to jointly predict document layout and reading order, leveraging visual information to ensure structural and sequential consistency. The second stage performs localized recognition of text, formulas, and tables within detected regions, maintaining high visual fidelity while reducing error propagation. To address complex table structures, we propose a visual consistency-based reinforcement learning scheme that evaluates recognition quality via render-and-compare alignment, improving structural accuracy without manual annotations. Additionally, two specialized modules, Image-Decoupled Table Parsing and Type-Guided Table Merging, are introduced to enable reliable parsing of tables containing embedded images and reconstruction of tables crossing pages or columns. Comprehensive experiments on OmniDocBench v1.5 demonstrate that MonkeyOCR v1.5 achieves state-of-the-art performance, outperforming PPOCR-VL and MinerU 2.5 while showing exceptional robustness in visually complex document scenarios.

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

CV and Pattern Recognition

Helps computers understand any document faster.

5 Jun 2025 1

89%

dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model

CV and Pattern Recognition

AI reads any document, in any language, perfectly.

2 Dec 2025 1

88%

DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios

CV and Pattern Recognition

Reads math formulas from books automatically.

1 Aug 2025 1

View PDF Login to Bookmark

Page Count

17 pages

MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns

Reads messy, complex documents perfectly.

Technical Abstract

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model

DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios