Excess Description Length of Learning Generalizable Predictors
By: Elizabeth Donoway , Hailey Joren , Fabien Roger and more
Potential Business Impact:
Measures how much a computer learned from training.
Understanding whether fine-tuning elicits latent capabilities or teaches new ones is a fundamental question for language model evaluation and safety. We develop a formal information-theoretic framework for quantifying how much predictive structure fine-tuning extracts from the train dataset and writes into a model's parameters. Our central quantity, Excess Description Length (EDL), is defined via prequential coding and measures the gap between the bits required to encode training labels sequentially using an evolving model (trained online) and the residual encoding cost under the final trained model. We establish that EDL is non-negative in expectation, converges to surplus description length in the infinite-data limit, and provides bounds on expected generalization gain. Through a series of toy models, we clarify common confusions about information in learning: why random labels yield EDL near zero, how a single example can eliminate many bits of uncertainty about the underlying rule(s) that describe the data distribution, why structure learned on rare inputs contributes proportionally little to expected generalization, and how format learning creates early transients distinct from capability acquisition. This framework provides rigorous foundations for the empirical observation that capability elicitation and teaching exhibit qualitatively distinct scaling signatures.
Similar Papers
EDCO: Dynamic Curriculum Orchestration for Domain-specific Large Language Model Fine-tuning
Machine Learning (CS)
Teaches AI faster by picking the best lessons.
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
Machine Learning (CS)
Makes AI answer questions shorter and smarter.
Unifying Learning Dynamics and Generalization in Transformers Scaling Law
Machine Learning (CS)
Makes AI learn better with more computer power.