Score: 0

From Small to Large: Generalization Bounds for Transformers on Variable-Size Inputs

Published: December 14, 2025 | arXiv ID: 2512.12805v1

By: Anastasiia Alokhina, Pan Li

Transformers exhibit a notable property of \emph{size generalization}, demonstrating an ability to extrapolate from smaller token sets to significantly longer ones. This behavior has been documented across diverse applications, including point clouds, graphs, and natural language. Despite its empirical success, this capability still lacks some rigorous theoretical characterizations. In this paper, we develop a theoretical framework to analyze this phenomenon for geometric data, which we represent as discrete samples from a continuous source (e.g., point clouds from manifolds, graphs from graphons). Our core contribution is a bound on the error between the Transformer's output for a discrete sample and its continuous-domain equivalent. We prove that for Transformers with stable positional encodings, this bound is determined by the sampling density and the intrinsic dimensionality of the data manifold. Experiments on graphs and point clouds of various sizes confirm the tightness of our theoretical bound.

Quantitative Bounds for Length Generalization in Transformers

Machine Learning (CS)

Makes AI understand longer text by training it more.

30 Oct 2025 0

89%

Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights

Machine Learning (CS)

Makes AI learn better from messy information.

6 May 2025 0

88%

Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective

Machine Learning (CS)

Makes AI understand complex patterns better and faster.

18 Apr 2025 0

View PDF Login to Bookmark

From Small to Large: Generalization Bounds for Transformers on Variable-Size Inputs

Technical Abstract

Quantitative Bounds for Length Generalization in Transformers

Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights

Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective