Score: 0

Asymptotic Study of In-context Learning with Random Transformers through Equivalent Models

Published: September 18, 2025 | arXiv ID: 2509.15152v1

By: Samet Demir, Zafer Dogan

Potential Business Impact:

Teaches computers to learn from examples faster.

Business Areas:

A/B Testing Data and Analytics

We study the in-context learning (ICL) capabilities of pretrained Transformers in the setting of nonlinear regression. Specifically, we focus on a random Transformer with a nonlinear MLP head where the first layer is randomly initialized and fixed while the second layer is trained. Furthermore, we consider an asymptotic regime where the context length, input dimension, hidden dimension, number of training tasks, and number of training samples jointly grow. In this setting, we show that the random Transformer behaves equivalent to a finite-degree Hermite polynomial model in terms of ICL error. This equivalence is validated through simulations across varying activation functions, context lengths, hidden layer widths (revealing a double-descent phenomenon), and regularization settings. Our results offer theoretical and empirical insights into when and how MLP layers enhance ICL, and how nonlinearity and over-parameterization influence model performance.

How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs

Machine Learning (Stat)

Makes AI learn better from examples.

29 Oct 2025 1

90%

Scaling Laws and In-Context Learning: A Unified Theoretical Framework

Machine Learning (CS)

Makes AI learn new things faster with more data.

9 Nov 2025 0

90%

Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers

Machine Learning (CS)

Explains how AI learns new tasks from examples.

17 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇹🇷 Turkey

Page Count

6 pages

Asymptotic Study of In-context Learning with Random Transformers through Equivalent Models

Teaches computers to learn from examples faster.

Technical Abstract

How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs

Scaling Laws and In-Context Learning: A Unified Theoretical Framework

Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers