Score: 0

Scaling Laws and In-Context Learning: A Unified Theoretical Framework

Published: November 9, 2025 | arXiv ID: 2511.06232v1

By: Sushant Mehta, Ishan Gupta

Potential Business Impact:

Makes AI learn new things faster with more data.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

In-context learning (ICL) enables large language models to adapt to new tasks from demonstrations without parameter updates. Despite extensive empirical studies, a principled understanding of ICL emergence at scale remains more elusive. We present a unified theoretical framework connecting scaling laws to ICL emergence in transformers. Our analysis establishes that ICL performance follows power-law relationships with model depth $L$, width $d$, context length $k$, and training data $D$, with exponents determined by task structure. We show that under specific conditions, transformers implement gradient-based metalearning in their forward pass, with an effective learning rate $\eta_{\text{eff}} = \Theta(1/\sqrt{Ld})$. We demonstrate sharp phase transitions at critical scales and derive optimal depth-width allocations favoring $L^* \propto N^{2/3}$, $d^* \propto N^{1/3}$ for the fixed parameter budget $N = Ld$. Systematic experiments on synthetic tasks validate our predictions, with measured scaling exponents closely matching theory. This work provides both necessary and sufficient conditions for the emergence of ICLs and establishes fundamental computational limits on what transformers can learn in-context.

Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning

Computation and Language

AI learns new things from just a few examples.

16 May 2025 2

91%

Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time

Machine Learning (Stat)

Makes AI learn better by changing its size.

1 Oct 2025 0

91%

Pretrain-Test Task Alignment Governs Generalization in In-Context Learning

Machine Learning (Stat)

Helps computers learn from examples better.

30 Sep 2025 0

View PDF Login to Bookmark

Page Count

8 pages

Scaling Laws and In-Context Learning: A Unified Theoretical Framework

Makes AI learn new things faster with more data.

Technical Abstract

Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning

Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time

Pretrain-Test Task Alignment Governs Generalization in In-Context Learning