Score: 4

PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing

Published: January 29, 2026 | arXiv ID: 2601.21957v1

By: Cheng Cui , Ting Sun , Suyin Liang and more

BigTech Affiliations: Baidu

Potential Business Impact:

Reads documents and recognizes seals better.

Business Areas:
Image Recognition Data and Analytics, Software

We introduce PaddleOCR-VL-1.5, an upgraded model achieving a new state-of-the-art (SOTA) accuracy of 94.5% on OmniDocBench v1.5. To rigorously evaluate robustness against real-world physical distortions, including scanning, skew, warping, screen-photography, and illumination, we propose the Real5-OmniDocBench benchmark. Experimental results demonstrate that this enhanced model attains SOTA performance on the newly curated benchmark. Furthermore, we extend the model's capabilities by incorporating seal recognition and text spotting tasks, while remaining a 0.9B ultra-compact VLM with high efficiency. Code: https://github.com/PaddlePaddle/PaddleOCR

Country of Origin
🇨🇳 China


Page Count
46 pages

Category
Computer Science:
CV and Pattern Recognition