A Simple Generalisation of the Implicit Dynamics of In-Context Learning
By: Francesco Innocenti, El Mehdi Achour
In-context learning (ICL) refers to the ability of a model to learn new tasks from examples in its input without any parameter updates. In contrast to previous theories of ICL relying on toy models and data settings, recently it has been shown that an abstraction of a transformer block can be seen as implicitly updating the weights of its feedforward network according to the context (Dherin et al., 2025). Here, we provide a simple generalisation of this result for (i) all sequence positions beyond the last, (ii) any transformer block beyond the first, and (iii) more realistic residual blocks including layer normalisation. We empirically verify our theory on simple in-context linear regression tasks and investigate the relationship between the implicit updates related to different tokens within and between blocks. These results help to bring the theory of Dherin et al. (2025) even closer to practice, with potential for validation on large-scale models.
Similar Papers
Scaling Laws and In-Context Learning: A Unified Theoretical Framework
Machine Learning (CS)
Makes AI learn new things faster with more data.
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning
Computation and Language
AI learns new things from just a few examples.
Can Transformers Break Encryption Schemes via In-Context Learning?
Machine Learning (CS)
Teaches computers to break secret codes.