EntroCoT: Enhancing Chain-of-Thought via Adaptive Entropy-Guided Segmentation
By: Zihang Li , Yuhang Wang , Yikun Zong and more
Potential Business Impact:
Fixes AI math mistakes for better answers.
Chain-of-Thought (CoT) prompting has significantly enhanced the mathematical reasoning capabilities of Large Language Models. We find existing fine-tuning datasets frequently suffer from the "answer right but reasoning wrong" probelm, where correct final answers are derived from hallucinated, redundant, or logically invalid intermediate steps. This paper proposes EntroCoT, a unified framework for automatically identifying and refining low-quality CoT supervision traces. EntroCoT first proposes an entropy-based mechanism to segment the reasoning trace into multiple steps at uncertain junctures, and then introduces a Monte Carlo rollout-based mechanism to evaluate the marginal contribution of each step. By accurately filtering deceptive reasoning samples, EntroCoT constructs a high-quality dataset where every intermediate step in each reasoning trace facilitates the final answer. Extensive experiments on mathematical benchmarks demonstrate that fine-tuning on the subset constructed by EntroCoT consistently outperforms the baseslines of full-dataset supervision.
Similar Papers
Compressing Chain-of-Thought in LLMs via Step Entropy
Artificial Intelligence
Makes AI think faster by cutting out extra words.
Non-Iterative Symbolic-Aided Chain-of-Thought for Logical Reasoning
Artificial Intelligence
Helps computers think through problems better.
Co-CoT: A Prompt-Based Framework for Collaborative Chain-of-Thought Reasoning
Computation and Language
Lets you change how AI thinks to understand it.