Score: 0

Modeling Language as a Sequence of Thoughts

Published: December 31, 2025 | arXiv ID: 2512.25026v1

By: Nasim Borazjanizadeh, James McClelland

Transformer language models can generate strikingly natural text by modeling language as a sequence of tokens. Yet, by relying primarily on surface-level co-occurrence statistics, they fail to form globally consistent latent representations of entities and events, lack of which contributes to brittleness in relational direction (e.g., reversal curse), contextualization errors, and data inefficiency. On the other hand, cognitive science shows that human comprehension involves converting the input linguistic stream into compact, event-like representations that persist in memory while verbatim form is short-lived. Motivated by this view, we introduce Thought Gestalt (TG) model, a recurrent Transformer that models language at two levels of abstraction - tokens and sentence-level "thought" states. TG generates the tokens of one sentence at a time while cross-attending to a memory of prior sentence representations. In TG, token and sentence representations are generated using the same set of model parameters and trained with a single objective, the next-token cross-entropy: by retaining the computation graph of sentence representations written to memory, gradients from future token losses flow backward through cross-attention to optimize the parameters generating earlier sentence vectors. In scaling experiments, TG consistently improves efficiency over matched GPT-2 runs, among other baselines, with scaling fits indicating GPT-2 requires ~5-8% more data and ~33-42% more parameters to match TG's loss. TG also reduces errors on relational direction generalization on a father-son reversal curse probe.

Latent Thought Models with Variational Bayes Inference-Time Computation

Computation and Language

Computers learn to think and reason better.

3 Feb 2025 0

89%

LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens

Computation and Language

Makes computer translators better by showing them how.

13 Oct 2025 1

88%

Idea-Gated Transformers: Enforcing Semantic Coherence via Differentiable Vocabulary Pruning

Computation and Language

Keeps AI writing focused on the main topic.

3 Dec 2025 0

View PDF Login to Bookmark

Modeling Language as a Sequence of Thoughts

Technical Abstract

Latent Thought Models with Variational Bayes Inference-Time Computation

LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens

Idea-Gated Transformers: Enforcing Semantic Coherence via Differentiable Vocabulary Pruning