Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy
By: Xiaofeng Shi , Qian Kou , Yuduo Li and more
Potential Business Impact:
Improves AI answers by focusing on the final solution.
With the rapid advancement of Large Language Models (LLMs), the Chain-of-Thought (CoT) component has become significant for complex reasoning tasks. However, in conventional Supervised Fine-Tuning (SFT), the model could allocate disproportionately more attention to CoT sequences with excessive length. This reduces focus on the much shorter but essential Key portion-the final answer, whose correctness directly determines task success and evaluation quality. To address this limitation, we propose SFTKey, a two-stage training scheme. In the first stage, conventional SFT is applied to ensure proper output format, while in the second stage, only the Key portion is fine-tuned to improve accuracy. Extensive experiments across multiple benchmarks and model families demonstrate that SFTKey achieves an average accuracy improvement exceeding 5\% over conventional SFT, while preserving the ability to generate correct formats. Overall, this study advances LLM fine-tuning by explicitly balancing CoT learning with additional optimization on answer-relevant tokens.
Similar Papers
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models
Computation and Language
Makes AI think smarter, not longer.
Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning
Computation and Language
Teaches AI to focus on important math steps.
Empowering Lightweight MLLMs with Reasoning via Long CoT SFT
CV and Pattern Recognition
Teaches small AI to think better with examples.