Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition
By: Andrew Kiruluta, Eric Lundy, Priscilla Burity
Potential Business Impact:
Makes computers understand language much faster.
Existing sequence to sequence models for structured language tasks rely heavily on the dot product self attention mechanism, which incurs quadratic complexity in both computation and memory for input length N. We introduce the Graph Wavelet Transformer (GWT), a novel architecture that replaces this bottleneck with a learnable, multi scale wavelet transform defined over an explicit graph Laplacian derived from syntactic or semantic parses. Our analysis shows that multi scale spectral decomposition offers an interpretable, efficient, and expressive alternative to quadratic self attention for graph structured sequence modeling.
Similar Papers
Learnable Multi-Scale Wavelet Transformer: A Novel Alternative to Self-Attention
Machine Learning (CS)
Makes computers understand long sentences faster.
Graph-Based Spectral Decomposition for Parameter Coordination in Language Model Fine-Tuning
Machine Learning (CS)
Teaches computers to learn faster and better.
From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models
Computation and Language
Makes computers understand words much faster.