Autoregressivity in the Latent Space of a GP-VAE Language Model: An Empirical Ablation Study
By: Yves Ruffenach
This paper provides an ablation-based analysis of latent autoregression in GP-VAE models, building upon our previous work introducing the architecture. Language models typically rely on an autoregressive factorization over tokens. In contrast, our prior work proposed shifting sequential structure to the latent space through a causal Gaussian process, while using a non-autoregressive decoder. Here, we conduct a systematic ablation study of the role played by latent autoregression. We compare (i) a full GP-VAE model with autoregressive latent dynamics, (ii) a non-autoregressive ablation in which latent variables are independent, and (iii) a standard token-level autoregressive Transformer. Our results show that, within the considered regime (medium-scale corpora and short training contexts), latent autoregression induces latent trajectories that are significantly more compatible with the Gaussian-process prior and exhibit greater long-horizon stability. In contrast, removing autoregression leads to degraded latent structure and unstable long-range behavior. These findings highlight the role of latent autoregression as an effective mechanism for organizing long-range structure, while remaining complementary to token-level autoregressive modeling. They should be interpreted as an empirical analysis of representational structure rather than as a proposal for a new architecture.
Similar Papers
Latent-Autoregressive GP-VAE Language Model
Machine Learning (CS)
Lets computers write stories by understanding time.
Less Is More: Generating Time Series with LLaMA-Style Autoregression in Simple Factorized Latent Spaces
Machine Learning (CS)
Makes fake data that looks real, fast.
Multi-scale Autoregressive Models are Laplacian, Discrete, and Latent Diffusion Models in Disguise
Machine Learning (CS)
Makes AI draw pictures faster and better.