A Survey on LLM Mid-training
By: Chengying Tu , Xuemiao Zhang , Rongxiang Weng and more
Potential Business Impact:
Teaches computers new skills after basic learning.
Recent advances in foundation models have highlighted the significant benefits of multi-stage training, with a particular emphasis on the emergence of mid-training as a vital stage that bridges pre-training and post-training. Mid-training is distinguished by its use of intermediate data and computational resources, systematically enhancing specified capabilities such as mathematics, coding, reasoning, and long-context extension, while maintaining foundational competencies. This survey provides a formal definition of mid-training for large language models (LLMs) and investigates optimization frameworks that encompass data curation, training strategies, and model architecture optimization. We analyze mainstream model implementations in the context of objective-driven interventions, illustrating how mid-training serves as a distinct and critical stage in the progressive development of LLM capabilities. By clarifying the unique contributions of mid-training, this survey offers a comprehensive taxonomy and actionable insights, supporting future research and innovation in the advancement of LLMs.
Similar Papers
Mid-Training of Large Language Models: A Survey
Computation and Language
Improves AI learning for better understanding.
Midtraining Bridges Pretraining and Posttraining Distributions
Computation and Language
Teaches computers math and code better.
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
Computation and Language
Teaches computers to think better by training them step-by-step.