Score: 1

Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets

Published: April 28, 2025 | arXiv ID: 2504.19981v2

By: Adam Younsi , Abdalgader Abubaker , Mohamed El Amine Seddik and more

Potential Business Impact:

Teaches computers to solve math problems better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Achieving both accuracy and diverse reasoning remains challenging for Large Language Models (LLMs) in complex domains like mathematics. A key bottleneck is evaluating intermediate reasoning steps to guide generation without costly human annotations. To address this, we first introduce a novel Process Reward Model (PRM) trained automatically using Monte Carlo Tree Search coupled with a similarity-based data augmentation technique, effectively capturing step-level reasoning quality. Leveraging this PRM, we then adapt Generative Flow Networks (GFlowNets) to operate at the reasoning step level. Unlike traditional reinforcement learning focused on maximizing a single reward, GFlowNets naturally sample diverse, high-quality solutions proportional to their rewards, as measured by our PRM. Empirical evaluation shows strong improvements in both accuracy and solution diversity on challenging mathematical benchmarks (e.g., +2.59% absolute accuracy on MATH Level 5 for Llama3.2-3B), with effective generalization to unseen datasets (+9.4% absolute on SAT MATH). Our work demonstrates the potential of PRM-guided, step-level GFlowNets for developing more robust and versatile mathematical reasoning in LLMs.

Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners

Computation and Language

Teaches computers to solve problems step-by-step.

2 Mar 2025 1

91%

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning

Computation and Language

Fixes math problems by explaining each step.

6 Aug 2025 1

91%

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning

Computation and Language

Fixes math problems by explaining each step.

6 Aug 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

9 pages

Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets

Teaches computers to solve math problems better.

Technical Abstract

Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning