Score: 2

Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning

Published: November 10, 2025 | arXiv ID: 2511.06946v1

By: Daniel De Dios Allegue, Jinke He, Frans A. Oliehoek

Potential Business Impact:

Teaches robots to learn faster from past actions.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Transformers have shown strong ability to model long-term dependencies and are increasingly adopted as world models in model-based reinforcement learning (RL) under partial observability. However, unlike natural language corpora, RL trajectories are sparse and reward-driven, making standard self-attention inefficient because it distributes weight uniformly across all past tokens rather than emphasizing the few transitions critical for control. To address this, we introduce structured inductive priors into the self-attention mechanism of the dynamics head: (i) per-head memory-length priors that constrain attention to task-specific windows, and (ii) distributional priors that learn smooth Gaussian weightings over past state-action pairs. We integrate these mechanisms into UniZero, a model-based RL agent with a Transformer-based world model that supports planning under partial observability. Experiments on the Atari 100k benchmark show that most efficiency gains arise from the Gaussian prior, which smoothly allocates attention to informative transitions, while memory-length priors often truncate useful signals with overly restrictive cut-offs. In particular, Gaussian Attention achieves a 77% relative improvement in mean human-normalized scores over UniZero. These findings suggest that in partially observable RL domains with non-stationary temporal dependencies, discrete memory windows are difficult to learn reliably, whereas smooth distributional priors flexibly adapt across horizons and yield more robust data efficiency. Overall, our results demonstrate that encoding structured temporal priors directly into self-attention improves the prioritization of informative histories for dynamics modeling under partial observability.

Is Random Attention Sufficient for Sequence Modeling? Disentangling Trainable Components in the Transformer

Machine Learning (CS)

Lets computers learn by focusing on important words.

1 Jun 2025 4

87%

Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems

Machine Learning (CS)

Helps computers understand different kinds of information together.

18 Sep 2025 2

87%

Why Generate When You Can Transform? Unleashing Generative Attention for Dynamic Recommendation

Information Retrieval

Helps apps guess what you'll like next.

4 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇳🇱 Netherlands

Repos / Data Links

github.com

Page Count

16 pages

Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning

Teaches robots to learn faster from past actions.

Technical Abstract

Is Random Attention Sufficient for Sequence Modeling? Disentangling Trainable Components in the Transformer

Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems

Why Generate When You Can Transform? Unleashing Generative Attention for Dynamic Recommendation