PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing
By: Cheng Cui , Ting Sun , Suyin Liang and more
Potential Business Impact:
Reads documents and recognizes seals better.
We introduce PaddleOCR-VL-1.5, an upgraded model achieving a new state-of-the-art (SOTA) accuracy of 94.5% on OmniDocBench v1.5. To rigorously evaluate robustness against real-world physical distortions, including scanning, skew, warping, screen-photography, and illumination, we propose the Real5-OmniDocBench benchmark. Experimental results demonstrate that this enhanced model attains SOTA performance on the newly curated benchmark. Furthermore, we extend the model's capabilities by incorporating seal recognition and text spotting tasks, while remaining a 0.9B ultra-compact VLM with high efficiency. Code: https://github.com/PaddlePaddle/PaddleOCR
Similar Papers
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
CV and Pattern Recognition
Reads and understands any document, fast.
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
CV and Pattern Recognition
Reads any document, even complex ones, fast.
MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns
CV and Pattern Recognition
Reads messy, complex documents perfectly.