VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions
By: Thu Phuong Nguyen , Duc M. Nguyen , Hyotaek Jeon and more
Potential Business Impact:
Helps computers grade handwritten math homework.
Automatically assessing handwritten mathematical solutions is an important problem in educational technology with practical applications, but it remains a significant challenge due to the diverse formats, unstructured layouts, and symbolic complexity of student work. To address this challenge, we introduce VEHME-a Vision-Language Model for Evaluating Handwritten Mathematics Expressions-designed to assess open-form handwritten math responses with high accuracy and interpretable reasoning traces. VEHME integrates a two-phase training pipeline: (i) supervised fine-tuning using structured reasoning data, and (ii) reinforcement learning that aligns model outputs with multi-dimensional grading objectives, including correctness, reasoning depth, and error localization. To enhance spatial understanding, we propose an Expression-Aware Visual Prompting Module, trained on our synthesized multi-line math expressions dataset to robustly guide attention in visually heterogeneous inputs. Evaluated on AIHub and FERMAT datasets, VEHME achieves state-of-the-art performance among open-source models and approaches the accuracy of proprietary systems, demonstrating its potential as a scalable and accessible tool for automated math assessment. Our training and experiment code is publicly available at our GitHub repository.
Similar Papers
Mask & Match: Learning to Recognize Handwritten Math with Self-Supervised Attention
CV and Pattern Recognition
Lets computers understand handwritten math problems.
Link prediction Graph Neural Networks for structure recognition of Handwritten Mathematical Expressions
CV and Pattern Recognition
Lets computers understand handwritten math problems.
MathSight: A Benchmark Exploring Have Vision-Language Models Really Seen in University-Level Mathematical Reasoning?
CV and Pattern Recognition
Tests if computers *really* see math problems.