Score: 2

IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch

Published: November 30, 2025 | arXiv ID: 2512.00997v1

By: Param Biyani , Shashank Kirtania , Yasharth Bajpai and more

BigTech Affiliations: Microsoft

Potential Business Impact:

Helps computers prove math problems from Olympiads.

Business Areas:
Artificial Intelligence Artificial Intelligence, Data and Analytics, Science and Engineering, Software

We introduce IndiMathBench, a human-verified benchmark designed to evaluate mathematical theorem proving, curated using an AI-powered human-assisted pipeline for formalizing natural language problems in Lean. IndiMathBench is composed of 312 formal Lean 4 theorems paired with their corresponding informal problem statements, sourced from Indian Mathematics Olympiads. Through category-based retrieval, iterative compiler feedback, and multi-model ensembles, our pipeline generates candidate formalizations that experts efficiently validate via an interactive dashboard with automated quality summaries. Evaluation across multiple frontier models demonstrates that autoformalization remains challenging, with substantial gaps between syntactic validity and semantic correctness, while theorem proving success rates remain low even with iterative refinement, demonstrating that \benchmark~presents a challenging testbed for mathematical reasoning. IndiMathBench is available at https://github.com/prmbiy/IndiMathBench.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
27 pages

Category
Computer Science:
Artificial Intelligence