UniRec-0.1B: Unified Text and Formula Recognition with 0.1B Parameters
By: Yongkun Du , Zhineng Chen , Yazhen Xie and more
Text and formulas constitute the core informational components of many documents. Accurately and efficiently recognizing both is crucial for developing robust and generalizable document parsing systems. Recently, vision-language models (VLMs) have achieved impressive unified recognition of text and formulas. However, they are large-sized and computationally demanding, restricting their usage in many applications. In this paper, we propose UniRec-0.1B, a unified recognition model with only 0.1B parameters. It is capable of performing text and formula recognition at multiple levels, including characters, words, lines, paragraphs, and documents. To implement this task, we first establish UniRec40M, a large-scale dataset comprises 40 million text, formula and their mix samples, enabling the training of a powerful yet lightweight model. Secondly, we identify two challenges when building such a lightweight but unified expert model. They are: structural variability across hierarchies and semantic entanglement between textual and formulaic content. To tackle these, we introduce a hierarchical supervision training that explicitly guides structural comprehension, and a semantic-decoupled tokenizer that separates text and formula representations. Finally, we develop a comprehensive evaluation benchmark covering Chinese and English documents from multiple domains and with multiple levels. Experimental results on this and public benchmarks demonstrate that UniRec-0.1B outperforms both general-purpose VLMs and leading document parsing expert models, while achieving a 2-9$\times$ speedup, validating its effectiveness and efficiency. Codebase and Dataset: https://github.com/Topdu/OpenOCR.
Similar Papers
DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios
CV and Pattern Recognition
Reads math formulas from books automatically.
Uni-Parser Technical Report
CV and Pattern Recognition
Reads science papers and patents super fast.
MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns
CV and Pattern Recognition
Reads messy, complex documents perfectly.