Score: 0

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Published: January 14, 2026 | arXiv ID: 2601.09195v1

By: Tao Liu , Taiqiang Wu , Runming Yang and more

Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core expressions. Although our empirical analysis suggests that introducing multiple reference answers can mitigate this issue, the prohibitive data and computational costs necessitate a strategic shift: prioritizing the mitigation of single-reference overfitting over the costly pursuit of answer diversity. To achieve this, we reveal the intrinsic connection between token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions. Based on this insight, we propose ProFit, which selectively masks low-probability tokens to prevent surface-level overfitting. Extensive experiments confirm that ProFit consistently outperforms traditional SFT baselines on general reasoning and mathematical benchmarks.

Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality

Computation and Language

Makes AI better at following instructions.

17 Jun 2025 1

90%

Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning

Computation and Language

Teaches AI to focus on important math steps.

13 Oct 2025 1

89%

Self Distillation Fine-Tuning of Protein Language Models Improves Versatility in Protein Design

Machine Learning (CS)

Creates new, better proteins for science.

10 Dec 2025 0

View PDF Login to Bookmark

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Technical Abstract

Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality

Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning

Self Distillation Fine-Tuning of Protein Language Models Improves Versatility in Protein Design