Sliced ReLU attention: Quasi-linear contextual expressivity via sorting
By: Siwan Boufadène, François-Xavier Vialard
We introduce sliced ReLU attention, a new attention mechanism that departs structurally from both softmax and ReLU-based alternatives. Instead of applying a nonlinearity to pairwise dot products, we operate on one-dimensional projections of key--query differences and leverage sorting to obtain quasi-linear complexity. This construction yields a differentiable, non-symmetric kernel that can be computed in O(n log(n)) through a sorting procedure, making it suitable for very long contexts. Beyond computational benefits, the model retains strong theoretical expressive power: we establish two in-context expressivity results, previously known for softmax attention, showing that sliced ReLU attention preserves the ability to perform nontrivial sequence-to-sequence disentangling tasks and satisfies a contextual universal approximation property. Finally, we illustrate the potential practical interest of this kernel in small-scale experiments.
Similar Papers
Higher-order Linear Attention
Machine Learning (CS)
Makes AI understand long stories faster.
Customizing the Inductive Biases of Softmax Attention using Structured Matrices
Machine Learning (CS)
Makes AI smarter and faster at understanding things.
LUNA: Linear Universal Neural Attention with Generalization Guarantees
Machine Learning (CS)
Learns better computer understanding for long texts.