Score: 0

Transformers Meet In-Context Learning: A Universal Approximation Theory

Published: June 5, 2025 | arXiv ID: 2506.05200v2

By: Gen Li , Yuchen Jiao , Yu Huang and more

Potential Business Impact:

Teaches computers to learn new things instantly.

Business Areas:

Semantic Search Internet Services

Large language models are capable of in-context learning, the ability to perform new tasks at test time using a handful of input-output examples, without parameter updates. We develop a universal approximation theory to elucidate how transformers enable in-context learning. For a general class of functions (each representing a distinct task), we demonstrate how to construct a transformer that, without any further weight updates, can predict based on a few noisy in-context examples with vanishingly small risk. Unlike prior work that frames transformers as approximators of optimization algorithms (e.g., gradient descent) for statistical learning tasks, we integrate Barron's universal function approximation theory with the algorithm approximator viewpoint. Our approach yields approximation guarantees that are not constrained by the effectiveness of the optimization algorithms being mimicked, extending far beyond convex problems like linear regression. The key is to show that (i) any target function can be nearly linearly represented, with small $\ell_1$-norm, over a set of universal features, and (ii) a transformer can be constructed to find the linear representation -- akin to solving Lasso -- at test time.

Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights

Machine Learning (CS)

Makes AI learn better from messy information.

6 May 2025 0

90%

Transformers are almost optimal metalearners for linear classification

Machine Learning (CS)

Computers learn new tasks from few examples.

22 Oct 2025 0

89%

Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study

Machine Learning (CS)

Teaches computers to learn better from examples.

19 Mar 2025 1

View PDF Login to Bookmark

Page Count

38 pages

Transformers Meet In-Context Learning: A Universal Approximation Theory

Teaches computers to learn new things instantly.

Technical Abstract

Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights

Transformers are almost optimal metalearners for linear classification

Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study