Score: 1

From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models

Published: April 29, 2025 | arXiv ID: 2505.00033v1

By: Andrew Kiruluta

Potential Business Impact:

Makes computers understand words much faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We propose a novel spectral generative modeling framework for natural language processing that jointly learns a global time varying Fourier dictionary and per token mixing coefficients, replacing the ubiquitous self attention mechanism in transformer architectures. By enforcing reconstruction losses in both the time domain (embedding reconstruction) and the frequency domain (via Short Time Fourier Transform magnitude matching) alongside a standard language modeling objective, and fitting a Gaussian Mixture Model (GMM) prior over the learned mixing vectors, our approach achieves competitive perplexity and generation quality on standard benchmarks such as WikiText2 and Penn Treebank. In contrast to the quadratic computation complexity of self attention, our method operates with linear complexity, delivering substantial efficiency gains. We demonstrate that spectral dictionary models can achieve competitive performance compared to transformer baselines while significantly reducing inference latency and memory footprint, offering a compelling alternative for scalable language modeling.