DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs
By: Shidong Cao , Hongzhan Lin , Yuxuan Gu and more
Potential Business Impact:
Fixes math mistakes in AI step-by-step thinking.
Chain-of-Thought (CoT) reasoning improves multi-step mathematical problem solving in large language models but remains vulnerable to exposure bias and error accumulation, as early mistakes propagate irreversibly through autoregressive decoding. In this work, we propose DiffCoT, a diffusion-styled CoT framework that reformulates CoT reasoning as an iterative denoising process. DiffCoT integrates diffusion principles at the reasoning-step level via a sliding-window mechanism, enabling unified generation and retrospective correction of intermediate steps while preserving token-level autoregression. To maintain causal consistency, we further introduce a causal diffusion noise schedule that respects the temporal structure of reasoning chains. Extensive experiments on three multi-step CoT reasoning benchmarks across diverse model backbones demonstrate that DiffCoT consistently outperforms existing CoT preference optimization methods, yielding improved robustness and error-correction capability in CoT reasoning.
Similar Papers
CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning
Computation and Language
Teaches computers to reason better in science.
CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning
Computation and Language
Teaches computers to solve science problems better.
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
Computation and Language
Helps computers think better without changing them.