Think Like Human Developers: Harnessing Community Knowledge for Structured Code Reasoning
By: Chengran Yang , Zhensu Sun , Hong Jin Kang and more
Potential Business Impact:
Helps computers write complex code by learning from people.
Large Language Models (LLMs) have significantly advanced automated code generation, yet they struggle with complex coding tasks requiring multi-step logical reasoning. High-quality reasoning data is crucial for improving LLMs' reasoning capabilities, but such datasets remain scarce. Existing approaches either rely on computationally expensive reinforcement learning (RL) or error-prone reasoning chains synthesized by LLMs, posing challenges in scalability and accuracy. To address this challenge, we propose SVRC (Structured and Validated Reasoning Chains for Code Generation), a novel framework that mines, restructures, and enriches reasoning chains from community-driven discussions on software engineering platforms. SVRC refines unstructured and incomplete discussions of coding problems by aligning them with Software Development Life Cycle (SDLC) principles, ensuring that reasoning chains capture real-world problem-solving strategies and support iterative refinement. To evaluate the effectiveness of SVRC, we introduce CodeThinker, an LLM fine-tuned on 12,444 reasoning-augmented samples generated by SVRC. Experiments on LiveCodeBench show that CodeThinker surpasses its base model by 42.86\% on medium-level code problems in terms of pass@1 and outperforms GPT-4o-mini and GPT-4o by 73.14\% and 115.86\%, respectively. Our ablation study further highlights that each component of SVRC contributes to the reasoning capabilities of CodeThinker.
Similar Papers
How Does LLM Reasoning Work for Code? A Survey and a Call to Action
Software Engineering
Helps computers fix and write computer code.
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning
Artificial Intelligence
Helps computers solve math and logic problems.
A Study on Thinking Patterns of Large Reasoning Models in Code Generation
Software Engineering
Helps computers write better code by understanding thinking.