Score: 0

From Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMs

Published: September 8, 2025 | arXiv ID: 2509.06284v1

By: Jiaxiang Chen , Zhuo Wang , Mingxi Zou and more

Potential Business Impact:

Makes computers think more clearly and learn better.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) have advanced general-purpose reasoning, showing strong performance across diverse tasks. However, existing methods often rely on implicit exploration, where the model follows stochastic and unguided reasoning paths-like walking without a map. This leads to unstable reasoning paths, lack of error correction, and limited learning from past experience. To address these issues, we propose a framework that shifts from implicit exploration to structured reasoning through guideline and refinement. First, we extract structured reasoning patterns from successful trajectories and reflective signals from failures. During inference, the model follows these guidelines step-by-step, with refinement applied after each step to correct errors and stabilize the reasoning process. Experiments on BBH and four additional benchmarks (GSM8K, MATH-500, MBPP, HumanEval) show that our method consistently outperforms strong baselines across diverse reasoning tasks. Structured reasoning with stepwise execution and refinement improves stability and generalization, while guidelines transfer well across domains and flexibly support cross-model collaboration, matching or surpassing supervised fine-tuning in effectiveness and scalability.