Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer
By: Muhammad Tayyab Khan , Zane Yong , Lequn Chen and more
Potential Business Impact:
Helps machines read factory blueprints perfectly.
Accurate extraction of key information from 2D engineering drawings is crucial for high-precision manufacturing. Manual extraction is slow and labor-intensive, while traditional Optical Character Recognition (OCR) techniques often struggle with complex layouts and overlapping symbols, resulting in unstructured outputs. To address these challenges, this paper proposes a novel hybrid deep learning framework for structured information extraction by integrating an Oriented Bounding Box (OBB) detection model with a transformer-based document parsing model (Donut). An in-house annotated dataset is used to train YOLOv11 for detecting nine key categories: Geometric Dimensioning and Tolerancing (GD&T), General Tolerances, Measures, Materials, Notes, Radii, Surface Roughness, Threads, and Title Blocks. Detected OBBs are cropped into images and labeled to fine-tune Donut for structured JSON output. Fine-tuning strategies include a single model trained across all categories and category-specific models. Results show that the single model consistently outperforms category-specific ones across all evaluation metrics, achieving higher precision (94.77% for GD&T), recall (100% for most categories), and F1 score (97.3%), while reducing hallucinations (5.23%). The proposed framework improves accuracy, reduces manual effort, and supports scalable deployment in precision-driven industries.
Similar Papers
A Multi-Stage Hybrid Framework for Automated Interpretation of Multi-View Engineering Drawings Using Vision Language Model
CV and Pattern Recognition
Lets computers understand factory blueprints automatically.
Digitization of Document and Information Extraction using OCR
CV and Pattern Recognition
Reads messy and typed papers perfectly.
MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns
CV and Pattern Recognition
Reads messy, complex documents perfectly.