Score: 0

Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

Published: May 6, 2025 | arXiv ID: 2505.03469v2

By: Bin Yu , Hang Yuan , Haotian Li and more

Potential Business Impact:

Makes AI think smarter, not longer.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Recent advances in large language models have demonstrated that Supervised Fine-Tuning (SFT) with Chain-of-Thought (CoT) reasoning data distilled from large reasoning models (e.g., DeepSeek R1) can effectively transfer reasoning capabilities to non-reasoning models. However, models fine-tuned with this approach inherit the "overthinking" problem from teacher models, producing verbose and redundant reasoning chains during inference. To address this challenge, we propose Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning (LS-Mixture SFT), which combines long CoT reasoning dataset with their short counterparts obtained through structure-preserved rewriting. Our experiments demonstrate that models trained using the LS-Mixture SFT method, compared to those trained with direct SFT, achieved an average accuracy improvement of 2.3% across various benchmarks while substantially reducing model response length by approximately 47.61%. This work offers an approach to endow non-reasoning models with reasoning capabilities through supervised fine-tuning while avoiding the inherent overthinking problems inherited from teacher models, thereby enabling efficient reasoning in the fine-tuned models.

Empowering Lightweight MLLMs with Reasoning via Long CoT SFT

CV and Pattern Recognition

Teaches small AI to think better with examples.

3 Sep 2025 1

93%

Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy

Computation and Language

Improves AI answers by focusing on the final solution.

24 Dec 2025 1

93%

The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs

Computation and Language

Makes AI better at thinking, but not always together.

10 Jul 2025 2

View PDF Login to Bookmark

Page Count

12 pages

Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

Makes AI think smarter, not longer.

Technical Abstract

Empowering Lightweight MLLMs with Reasoning via Long CoT SFT

Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy

The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs