Score: 2

PaddleOCR 3.0 Technical Report

Published: July 8, 2025 | arXiv ID: 2507.05595v1

By: Cheng Cui , Ting Sun , Manhui Lin and more

BigTech Affiliations: Baidu

Potential Business Impact:

Reads text in pictures and understands documents.

Business Areas:
Image Recognition Data and Analytics, Software

This technical report introduces PaddleOCR 3.0, an Apache-licensed open-source toolkit for OCR and document parsing. To address the growing demand for document understanding in the era of large language models, PaddleOCR 3.0 presents three major solutions: (1) PP-OCRv5 for multilingual text recognition, (2) PP-StructureV3 for hierarchical document parsing, and (3) PP-ChatOCRv4 for key information extraction. Compared to mainstream vision-language models (VLMs), these models with fewer than 100 million parameters achieve competitive accuracy and efficiency, rivaling billion-parameter VLMs. In addition to offering a high-quality OCR model library, PaddleOCR 3.0 provides efficient tools for training, inference, and deployment, supports heterogeneous hardware acceleration, and enables developers to easily build intelligent document applications.

Country of Origin
🇨🇳 China


Page Count
24 pages

Category
Computer Science:
CV and Pattern Recognition