Score: 1

Self-Verifying Reflection Helps Transformers with CoT Reasoning

Published: October 14, 2025 | arXiv ID: 2510.12157v1

By: Zhongwei Yu , Wannian Xia , Xue Yan and more

Potential Business Impact:

Helps small computers solve hard math problems.

Business Areas:

Autonomous Vehicles Transportation

Advanced large language models (LLMs) frequently reflect in reasoning chain-of-thoughts (CoTs), where they self-verify the correctness of current solutions and explore alternatives. However, given recent findings that LLMs detect limited errors in CoTs, how reflection contributes to empirical improvements remains unclear. To analyze this issue, in this paper, we present a minimalistic reasoning framework to support basic self-verifying reflection for small transformers without natural language, which ensures analytic clarity and reduces the cost of comprehensive experiments. Theoretically, we prove that self-verifying reflection guarantees improvements if verification errors are properly bounded. Experimentally, we show that tiny transformers, with only a few million parameters, benefit from self-verification in both training and reflective execution, reaching remarkable LLM-level performance in integer multiplication and Sudoku. Similar to LLM results, we find that reinforcement learning (RL) improves in-distribution performance and incentivizes frequent reflection for tiny transformers, yet RL mainly optimizes shallow statistical patterns without faithfully reducing verification errors. In conclusion, integrating generative transformers with discriminative verification inherently facilitates CoT reasoning, regardless of scaling and natural language.

From Emergence to Control: Probing and Modulating Self-Reflection in Language Models

Machine Learning (CS)

Makes AI think again to solve problems better.

13 Jun 2025 0

89%

Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning

Computation and Language

Computers can fix their own math mistakes.

18 Jun 2025 0

88%

Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models

Computation and Language

Makes AI write shorter answers without losing meaning.

14 Jun 2025 0

View PDF Login to Bookmark

Country of Origin

🇬🇧 United Kingdom

Repos / Data Links

github.com github.com github.com

Page Count

44 pages

Self-Verifying Reflection Helps Transformers with CoT Reasoning

Helps small computers solve hard math problems.

Technical Abstract

From Emergence to Control: Probing and Modulating Self-Reflection in Language Models

Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning

Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models