How Smoothing is N-simplicial Attention?
By: Alexandre Dussolle, Pietro Liò
Going from pure Multilayer Perceptron (MLP) to a learnable graph message-passing mechanism at each layer has been foundational to state-of-the-art results, despite the computational trade-off (e.g. GATs or Transformers). To go a step further, in this work, we introduce N-simplicial attention, going from pairwise token similarity to higher-order interactions, and adapt it for Rotary Position Embeddings (RoPE). To help manage the increased complexity, we propose a cost-effective simplex selection enabling the model to focus its computation load onto the more task-sensitive interactions. Beyond these core mechanisms, we study how smoothing N-simplicial attention is by deriving a Lipschitz upper-bound and by demonstrating that by itself it also suffers from over-smoothing, despite opening the attention message-passing to higher-order interactions.
Similar Papers
Making Every Head Count: Sparse Attention Without the Speed-Performance Trade-off
Machine Learning (CS)
Makes AI understand long texts much faster.
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Machine Learning (CS)
Makes AI understand words faster, even with position.
Wavy Transformer
Machine Learning (CS)
Fixes AI confusion for better understanding.