Score: 2

LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation

Published: May 17, 2025 | arXiv ID: 2505.12031v1

By: Junyu Lai , Jiakun Zhang , Shuo Xu and more

Potential Business Impact:

Teaches computers to prove math problems faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Recent advancements in large language models (LLMs) have sparked considerable interest in automated theorem proving and a prominent line of research integrates stepwise LLM-based provers into tree search. In this paper, we introduce a novel proof-state exploration approach for training data synthesis, designed to produce diverse tactics across a wide range of intermediate proof states, thereby facilitating effective one-shot fine-tuning of LLM as the policy model. We also propose an adaptive beam size strategy, which effectively takes advantage of our data synthesis method and achieves a trade-off between exploration and exploitation during tree search. Evaluations on the MiniF2F and ProofNet benchmarks demonstrate that our method outperforms strong baselines under the stringent Pass@1 metric, attaining an average pass rate of $60.74\%$ on MiniF2F and $21.18\%$ on ProofNet. These results underscore the impact of large-scale synthetic data in advancing automated theorem proving.

Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs

Logic in Computer Science

Makes AI better at proving math ideas.

16 Feb 2025 1

89%

Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs

Computation and Language

Helps computers prove math theorems using smart connections.

4 Feb 2025 1

89%

Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification

Artificial Intelligence

Helps computers check if code is correct.

23 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇬🇧 United Kingdom, China

Repos / Data Links

github.com github.com github.com

Page Count

20 pages

LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation

Teaches computers to prove math problems faster.

Technical Abstract

Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs

Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs

Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification