Attention on the Sphere
By: Boris Bonev , Max Rietmann , Andrea Paris and more
Potential Business Impact:
Helps computers understand 3D shapes better.
We introduce a generalized attention mechanism for spherical domains, enabling Transformer architectures to natively process data defined on the two-dimensional sphere - a critical need in fields such as atmospheric physics, cosmology, and robotics, where preserving spherical symmetries and topology is essential for physical accuracy. By integrating numerical quadrature weights into the attention mechanism, we obtain a geometrically faithful spherical attention that is approximately rotationally equivariant, providing strong inductive biases and leading to better performance than Cartesian approaches. To further enhance both scalability and model performance, we propose neighborhood attention on the sphere, which confines interactions to geodesic neighborhoods. This approach reduces computational complexity and introduces the additional inductive bias for locality, while retaining the symmetry properties of our method. We provide optimized CUDA kernels and memory-efficient implementations to ensure practical applicability. The method is validated on three diverse tasks: simulating shallow water equations on the rotating sphere, spherical image segmentation, and spherical depth estimation. Across all tasks, our spherical Transformers consistently outperform their planar counterparts, highlighting the advantage of geometric priors for learning on spherical domains.
Similar Papers
RiemannFormer: A Framework for Attention in Curved Spaces
Machine Learning (CS)
Makes AI smarter by understanding words better.
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Machine Learning (CS)
Helps computers understand different kinds of information together.
Field-Space Attention for Structure-Preserving Earth System Transformers
Machine Learning (CS)
Makes weather forecasts more accurate and reliable.