Search-Based Correction of Reasoning Chains for Language Models
By: Minsu Kim , Jean-Pierre Falet , Oliver E. Richardson and more
Potential Business Impact:
Fixes AI mistakes in its thinking steps.
Chain-of-Thought (CoT) reasoning has advanced the capabilities and transparency of language models (LMs); however, reasoning chains can contain inaccurate statements that reduce performance and trustworthiness. To address this, we introduce a new self-correction framework that augments each reasoning step in a CoT with a latent variable indicating its veracity, enabling modeling of all possible truth assignments rather than assuming correctness throughout. To efficiently explore this expanded space, we introduce Search Corrector, a discrete search algorithm over boolean-valued veracity assignments. It efficiently performs otherwise intractable inference in the posterior distribution over veracity assignments by leveraging the LM's joint likelihood over veracity and the final answer as a proxy reward. This efficient inference-time correction method facilitates supervised fine-tuning of an Amortized Corrector by providing pseudo-labels for veracity. The Amortized Corrector generalizes self-correction, enabling accurate zero-shot veracity inference in novel contexts. Empirical results demonstrate that Search Corrector reliably identifies errors in logical (ProntoQA) and mathematical reasoning (GSM8K) benchmarks. The Amortized Corrector achieves comparable zero-shot accuracy and improves final answer accuracy by up to 25%.
Similar Papers
Non-Iterative Symbolic-Aided Chain-of-Thought for Logical Reasoning
Artificial Intelligence
Helps computers think through problems better.
ASCoT: An Adaptive Self-Correction Chain-of-Thought Method for Late-Stage Fragility in LLMs
Computation and Language
Fixes AI mistakes that happen late in thinking.
Latent Chain-of-Thought for Visual Reasoning
Artificial Intelligence
Makes AI think step-by-step better for new problems.