Score: 2

Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique

Published: March 21, 2025 | arXiv ID: 2503.17363v1

By: Yansi Li , Jiahao Xu , Tian Liang and more

BigTech Affiliations: Tencent

Potential Business Impact:

Helps computers think through hard problems better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Enhancing the reasoning capabilities of large language models (LLMs), particularly for complex tasks requiring multi-step logical deductions, remains a significant challenge. Traditional inference time scaling methods utilize scalar reward signals from process reward models to evaluate candidate reasoning steps, but these scalar rewards lack the nuanced qualitative information essential for understanding and justifying each step. In this paper, we propose a novel inference-time scaling approach -- stepwise natural language self-critique (PANEL), which employs self-generated natural language critiques as feedback to guide the step-level search process. By generating rich, human-readable critiques for each candidate reasoning step, PANEL retains essential qualitative information, facilitating better-informed decision-making during inference. This approach bypasses the need for task-specific verifiers and the associated training overhead, making it broadly applicable across diverse tasks. Experimental results on challenging reasoning benchmarks, including AIME and GPQA, demonstrate that PANEL significantly enhances reasoning performance, outperforming traditional scalar reward-based methods. Our code is available at https://github.com/puddingyeah/PANEL to support and encourage future research in this promising field.

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Computation and Language

Helps computers learn better from mistakes and feedback.

3 Jun 2025 1

89%

Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning

Artificial Intelligence

Teaches computers to think and check their own answers.

17 Dec 2025 2

89%

DeepCritic: Deliberate Critique with Large Language Models

Computation and Language

Helps AI check math answers more carefully.

1 May 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

18 pages

Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique

Helps computers think through hard problems better.

Technical Abstract

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning

DeepCritic: Deliberate Critique with Large Language Models