Score: 0

Beyond Semantics: How Temporal Biases Shape Retrieval in Transformer and State-Space Models

Published: October 26, 2025 | arXiv ID: 2510.22752v1

By: Anooshka Bajaj , Deven Mahesh Mistry , Sahaj Singh Maini and more

Potential Business Impact:

Helps computers remember events in order.

Business Areas:

Semantic Search Internet Services

In-context learning is governed by both temporal and semantic relationships, shaping how Large Language Models (LLMs) retrieve contextual information. Analogous to human episodic memory, where the retrieval of specific events is enabled by separating events that happened at different times, this work probes the ability of various pretrained LLMs, including transformer and state-space models, to differentiate and retrieve temporally separated events. Specifically, we prompted models with sequences containing multiple presentations of the same token, which reappears at the sequence end. By fixing the positions of these repeated tokens and permuting all others, we removed semantic confounds and isolated temporal effects on next-token prediction. Across diverse sequences, models consistently placed the highest probabilities on tokens following a repeated token, but with a notable bias for those nearest the beginning or end of the input. An ablation experiment linked this phenomenon in transformers to induction heads. Extending the analysis to unique semantic contexts with partial overlap further demonstrated that memories embedded in the middle of a prompt are retrieved less reliably. Despite architectural differences, state-space and transformer models showed comparable temporal biases. Our findings deepen the understanding of temporal biases in in-context learning and offer an illustration of how these biases can enable temporal separation and episodic retrieval.

From Time and Place to Preference: LLM-Driven Geo-Temporal Context in Recommendations

Information Retrieval

Helps movie suggestions understand holidays and seasons.

28 Oct 2025 0

88%

From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling

Computation and Language

Finds why computer words have unfair ideas.

18 May 2025 0

88%

Causality Matters: How Temporal Information Emerges in Video Language Models

CV and Pattern Recognition

Lets computers understand video time without special time codes.

15 Aug 2025 0

View PDF Login to Bookmark

Page Count

16 pages

Beyond Semantics: How Temporal Biases Shape Retrieval in Transformer and State-Space Models

Helps computers remember events in order.

Technical Abstract

From Time and Place to Preference: LLM-Driven Geo-Temporal Context in Recommendations

From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling

Causality Matters: How Temporal Information Emerges in Video Language Models