Score: 2

Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

Published: December 13, 2025 | arXiv ID: 2512.12167v1

By: Yoav Gelberg , Koshi Eguchi , Takuya Akiba and more

Potential Business Impact:

Makes computers understand longer stories without retraining.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

So far, expensive finetuning beyond the pretraining sequence length has been a requirement for effectively extending the context of language models (LM). In this work, we break this key bottleneck by Dropping the Positional Embeddings of LMs after training (DroPE). Our simple method is motivated by three key theoretical and empirical observations. First, positional embeddings (PEs) serve a crucial role during pretraining, providing an important inductive bias that significantly facilitates convergence. Second, over-reliance on this explicit positional information is also precisely what prevents test-time generalization to sequences of unseen length, even when using popular PE-scaling methods. Third, positional embeddings are not an inherent requirement of effective language modeling and can be safely removed after pretraining, following a short recalibration phase. Empirically, DroPE yields seamless zero-shot context extension without any long-context finetuning, quickly adapting pretrained LMs without compromising their capabilities in the original training context. Our findings hold across different models and dataset sizes, far outperforming previous specialized architectures and established rotary positional embedding scaling methods.

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Computation and Language

Lets computers remember much longer stories.

26 Apr 2025 1

90%

LaMPE: Length-aware Multi-grained Position Encoding for Adaptive Long-context Scaling Without Training

Computation and Language

Lets AI understand much longer texts.

4 Aug 2025 1

90%

LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training

Computation and Language

Lets computers understand much longer stories.

4 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇬🇧 United Kingdom

Repos / Data Links

github.com huggingface.co

Page Count

33 pages

Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

Makes computers understand longer stories without retraining.

Technical Abstract

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

LaMPE: Length-aware Multi-grained Position Encoding for Adaptive Long-context Scaling Without Training

LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training