Score: 0

Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier

Published: May 17, 2025 | arXiv ID: 2505.11966v1

By: Jianyuan Zhong , Zeju Li , Zhijian Xu and more

Potential Business Impact:

Makes AI smarter by checking its answers faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Model (LLM) reasoning for complex tasks inherently involves a trade-off between solution accuracy and computational efficiency. The subsequent step of verification, while intended to improve performance, further complicates this landscape by introducing its own challenging trade-off: sophisticated Generative Reward Models (GenRMs) can be computationally prohibitive if naively integrated with LLMs at test-time, while simpler, faster methods may lack reliability. To overcome these challenges, we introduce FlexiVe, a novel generative verifier that flexibly balances computational resources between rapid, reliable fast thinking and meticulous slow thinking using a Flexible Allocation of Verification Budget strategy. We further propose the Solve-Detect-Verify pipeline, an efficient inference-time scaling framework that intelligently integrates FlexiVe, proactively identifying solution completion points to trigger targeted verification and provide focused solver feedback. Experiments show FlexiVe achieves superior accuracy in pinpointing errors within reasoning traces on ProcessBench. Furthermore, on challenging mathematical reasoning benchmarks (AIME 2024, AIME 2025, and CNMO), our full approach outperforms baselines like self-consistency in reasoning accuracy and inference efficiency. Our system offers a scalable and effective solution to enhance LLM reasoning at test time.

When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

Computation and Language

Makes AI better at math by choosing the best answers.

1 Apr 2025 1

89%

Variation in Verification: Understanding Verification Dynamics in Large Language Models

Computation and Language

Makes AI better at checking its own answers.

22 Sep 2025 1

89%

Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection

Artificial Intelligence

Helps computers check math proofs for mistakes.

17 Nov 2025 1

View PDF Login to Bookmark

Page Count

18 pages

Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier

Makes AI smarter by checking its answers faster.

Technical Abstract

When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

Variation in Verification: Understanding Verification Dynamics in Large Language Models

Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection