Score: 2

Born a Transformer -- Always a Transformer?

Published: May 27, 2025 | arXiv ID: 2505.21785v2

By: Yana Veitsman , Mayank Jobanputra , Yash Sarrof and more

Potential Business Impact:

Models sometimes forget where to find information.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Transformers have theoretical limitations in modeling certain sequence-to-sequence tasks, yet it remains largely unclear if these limitations play a role in large-scale pretrained LLMs, or whether LLMs might effectively overcome these constraints in practice due to the scale of both the models themselves and their pretraining data. We explore how these architectural constraints manifest after pretraining, by studying a family of $\textit{retrieval}$ and $\textit{copying}$ tasks inspired by Liu et al. [2024a]. We use a recently proposed framework for studying length generalization [Huang et al., 2025] to provide guarantees for each of our settings. Empirically, we observe an $\textit{induction-versus-anti-induction}$ asymmetry, where pretrained models are better at retrieving tokens to the right (induction) rather than the left (anti-induction) of a query token. This asymmetry disappears upon targeted fine-tuning if length-generalization is guaranteed by theory. Mechanistic analysis reveals that this asymmetry is connected to the differences in the strength of induction versus anti-induction circuits within pretrained transformers. We validate our findings through practical experiments on real-world tasks demonstrating reliability risks. Our results highlight that pretraining selectively enhances certain transformer capabilities, but does not overcome fundamental length-generalization limits.

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

Machine Learning (CS)

Makes computers remember facts or solve new problems.

10 Jun 2025 1

89%

On the Generalizability of Transformer Models to Code Completions of Different Lengths

Software Engineering

Helps computers write code for different tasks.

9 Jan 2025 2

88%

A Survey on Large Language Models with some Insights on their Capabilities and Limitations

Computation and Language

Computers learn to think and solve problems.

3 Jan 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 🇩🇪 Germany, United States

Repos / Data Links

github.com github.com

Page Count

30 pages

Born a Transformer -- Always a Transformer?

Models sometimes forget where to find information.

Technical Abstract

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

On the Generalizability of Transformer Models to Code Completions of Different Lengths

A Survey on Large Language Models with some Insights on their Capabilities and Limitations