Score: 0

Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking

Published: March 25, 2025 | arXiv ID: 2503.19855v1

By: Xiaoyu Tian , Sitong Zhao , Haotian Wang and more

Potential Business Impact:

Makes AI smarter by letting it think more.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Recent advances in large language models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have demonstrated the effectiveness of test-time scaling, where extended reasoning processes substantially enhance model performance. Despite this, current models are constrained by limitations in handling long texts and reinforcement learning (RL) training efficiency. To address these issues, we propose a simple yet effective test-time scaling approach Multi-round Thinking. This method iteratively refines model reasoning by leveraging previous answers as prompts for subsequent rounds. Extensive experiments across multiple models, including QwQ-32B and DeepSeek-R1, consistently show performance improvements on various benchmarks such as AIME 2024, MATH-500, GPQA-diamond, and LiveCodeBench. For instance, the accuracy of QwQ-32B improved from 80.3% (Round 1) to 82.1% (Round 2) on the AIME 2024 dataset, while DeepSeek-R1 showed a similar increase from 79.7% to 82.0%. These results confirm that Multi-round Thinking is a broadly applicable, straightforward approach to achieving stable enhancements in model performance, underscoring its potential for future developments in test-time scaling techniques. The key prompt: {Original question prompt} The assistant's previous answer is: <answer> {last round answer} </answer>, and please re-answer.

Scaling Reasoning can Improve Factuality in Large Language Models

Computation and Language

Makes computers answer questions more accurately.

16 May 2025 1

91%

A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law

Artificial Intelligence

Computers learn to think deeply like people.

5 May 2025 1

91%

Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models

Artificial Intelligence

Thinking more makes computers worse at thinking.

4 Jun 2025 0

View PDF Login to Bookmark

Page Count

11 pages

Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking

Makes AI smarter by letting it think more.

Technical Abstract

Scaling Reasoning can Improve Factuality in Large Language Models

A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law

Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models