Score: 1

Memory Limitations of Prompt Tuning in Transformers

Published: August 30, 2025 | arXiv ID: 2509.00421v1

By: Maxime Meyer , Mario Michelessa , Caroline Chaux and more

Potential Business Impact:

Computers forget things when given too much information.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Despite the empirical success of prompt tuning in adapting pretrained language models to new tasks, theoretical analyses of its capabilities remain limited. Existing theoretical work primarily addresses universal approximation properties, demonstrating results comparable to standard weight tuning. In this paper, we explore a different aspect of the theory of transformers: the memorization capability of prompt tuning. We provide two principal theoretical contributions. First, we prove that the amount of information memorized by a transformer cannot scale faster than linearly with the prompt length. Second, and more importantly, we present the first formal proof of a phenomenon empirically observed in large language models: performance degradation in transformers with extended contexts. We rigorously demonstrate that transformers inherently have limited memory, constraining the amount of information they can retain, regardless of the context size. This finding offers a fundamental understanding of the intrinsic limitations of transformer architectures, particularly their ability to handle long sequences.

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

Machine Learning (CS)

Makes computers remember facts or solve new problems.

10 Jun 2025 1

88%

Memory-Augmented Transformers: A Systematic Review from Neuroscience Principles to Technical Solutions

Machine Learning (CS)

Computers remember more, learn longer, and think better.

14 Aug 2025 1

88%

Human-like fleeting memory improves language learning but impairs reading time prediction in transformer language models

Computation and Language

Makes computers learn language better, but not predict reading.

7 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇸🇬 Singapore

Page Count

20 pages

Memory Limitations of Prompt Tuning in Transformers

Computers forget things when given too much information.

Technical Abstract

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

Memory-Augmented Transformers: A Systematic Review from Neuroscience Principles to Technical Solutions

Human-like fleeting memory improves language learning but impairs reading time prediction in transformer language models