From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models
By: Andrew Kiruluta
Potential Business Impact:
Makes computers understand words much faster.
We propose a novel spectral generative modeling framework for natural language processing that jointly learns a global time varying Fourier dictionary and per token mixing coefficients, replacing the ubiquitous self attention mechanism in transformer architectures. By enforcing reconstruction losses in both the time domain (embedding reconstruction) and the frequency domain (via Short Time Fourier Transform magnitude matching) alongside a standard language modeling objective, and fitting a Gaussian Mixture Model (GMM) prior over the learned mixing vectors, our approach achieves competitive perplexity and generation quality on standard benchmarks such as WikiText2 and Penn Treebank. In contrast to the quadratic computation complexity of self attention, our method operates with linear complexity, delivering substantial efficiency gains. We demonstrate that spectral dictionary models can achieve competitive performance compared to transformer baselines while significantly reducing inference latency and memory footprint, offering a compelling alternative for scalable language modeling.
Similar Papers
Spectral Dictionary Learning for Generative Image Modeling
CV and Pattern Recognition
Creates new pictures from sound-like waves.
Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition
Computation and Language
Makes computers understand language much faster.
Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache
Computation and Language
Makes AI remember more without slowing down.