DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios
By: Yufeng Zhong , Zhixiong Zeng , Lei Chen and more
Potential Business Impact:
Reads math formulas from books automatically.
Optical Character Recognition (OCR) for mathematical formula is essential for the intelligent analysis of scientific literature. However, both task-specific and general vision-language models often struggle to handle the structural diversity, complexity, and real-world variability inherent in mathematical content. In this work, we present DocTron-Formula, a unified framework built upon general vision-language models, thereby eliminating the need for specialized architectures. Furthermore, we introduce CSFormula, a large-scale and challenging dataset that encompasses multidisciplinary and structurally complex formulas at the line, paragraph, and page levels. Through straightforward supervised fine-tuning, our approach achieves state-of-the-art performance across a variety of styles, scientific domains, and complex layouts. Experimental results demonstrate that our method not only surpasses specialized models in terms of accuracy and robustness, but also establishes a new paradigm for the automated understanding of complex scientific documents.
Similar Papers
MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns
CV and Pattern Recognition
Reads messy, complex documents perfectly.
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
CV and Pattern Recognition
AI reads any document, in any language, perfectly.
The Return of Structural Handwritten Mathematical Expression Recognition
CV and Pattern Recognition
Helps computers understand math handwriting better.