MiniF2F-Dafny: LLM-Guided Mathematical Theorem Proving via Auto-Active Verification
By: Mantas Baksys, Stefan Zetzsche, Olivier Bouissou
Potential Business Impact:
AI helps computers prove math problems automatically.
We present miniF2F-Dafny, the first translation of the mathematical reasoning benchmark miniF2F to an automated theorem prover: Dafny. Previously, the benchmark existed only in interactive theorem provers (Lean, Isabelle, HOL Light, Metamath). We find that Dafny's automation verifies 99/244 (40.6%) of the test set and 109/244 (44.7%) of the validation set with empty proofs--requiring no manual proof steps. For problems where empty proofs fail, we evaluate 12 off-the-shelf LLMs on providing proof hints. The best model we test achieves 55.7% pass@4 success rate employing iterative error correction. These preliminary results highlight an effective division of labor: LLMs provide high-level guidance while automation handles low-level details. Our benchmark can be found on GitHub at http://github.com/dafny-lang/miniF2F .
Similar Papers
miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward
Artificial Intelligence
Helps computers solve math problems correctly.
MiniF2F in Rocq: Automatic Translation Between Proof Assistants -- A Case Study
Logic in Computer Science
Helps computers prove math theorems automatically.
Inferring multiple helper Dafny assertions with LLMs
Software Engineering
Helps computers prove code is correct automatically.