LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation
By: Junyu Lai , Jiakun Zhang , Shuo Xu and more
Potential Business Impact:
Teaches computers to prove math problems faster.
Recent advancements in large language models (LLMs) have sparked considerable interest in automated theorem proving and a prominent line of research integrates stepwise LLM-based provers into tree search. In this paper, we introduce a novel proof-state exploration approach for training data synthesis, designed to produce diverse tactics across a wide range of intermediate proof states, thereby facilitating effective one-shot fine-tuning of LLM as the policy model. We also propose an adaptive beam size strategy, which effectively takes advantage of our data synthesis method and achieves a trade-off between exploration and exploitation during tree search. Evaluations on the MiniF2F and ProofNet benchmarks demonstrate that our method outperforms strong baselines under the stringent Pass@1 metric, attaining an average pass rate of $60.74\%$ on MiniF2F and $21.18\%$ on ProofNet. These results underscore the impact of large-scale synthetic data in advancing automated theorem proving.
Similar Papers
Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs
Logic in Computer Science
Makes AI better at proving math ideas.
Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs
Computation and Language
Helps computers prove math theorems using smart connections.
Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification
Artificial Intelligence
Helps computers check if code is correct.