Token Maturation: Autoregressive Language Generation via Continuous Token Dynamics
By: Oshri Naparstek
Potential Business Impact:
Makes AI write better by thinking longer first.
Autoregressive language models are conventionally defined over discrete token sequences, committing to a specific token at every generation step. This early discretization forces uncertainty to be resolved through token-level sampling, often leading to instability, repetition, and sensitivity to decoding heuristics. In this work, we introduce a continuous autoregressive formulation of language generation in which tokens are represented as continuous vectors that \emph{mature} over multiple update steps before being discretized. Rather than sampling tokens, the model evolves continuous token representations through a deterministic dynamical process, committing to a discrete token only when the representation has sufficiently converged. Discrete text is recovered via hard decoding, while uncertainty is maintained and resolved in the continuous space. We show that this maturation process alone is sufficient to produce coherent and diverse text using deterministic decoding (argmax), without reliance on token-level sampling, diffusion-style denoising, or auxiliary stabilization mechanisms. Additional perturbations, such as stochastic dynamics or history smoothing, can be incorporated naturally but are not required for the model to function. To our knowledge, this is the first autoregressive language model that generates text by evolving continuous token representations to convergence prior to discretization, enabling stable generation without token-level sampling.
Similar Papers
Continuous Autoregressive Language Models
Computation and Language
Makes AI write faster by thinking in chunks.
Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows
Machine Learning (CS)
Lets computers understand words better, faster, and more flexibly.
Continuous Diffusion Model for Language Modeling
Machine Learning (CS)
Makes computers write better by learning language patterns.