Score: 0

CliffordNet: All You Need is Geometric Algebra

Published: January 11, 2026 | arXiv ID: 2601.06793v1

By: Zhongping Ji

Modern computer vision architectures, from CNNs to Transformers, predominantly rely on the stacking of heuristic modules: spatial mixers (Attention/Conv) followed by channel mixers (FFNs). In this work, we challenge this paradigm by returning to mathematical first principles. We propose the \textbf{Clifford Algebra Network (CAN)}, also referred to as CliffordNet, a vision backbone grounded purely in Geometric Algebra. Instead of engineering separate modules for mixing and memory, we derive a unified interaction mechanism based on the \textbf{Clifford Geometric Product} ($uv = u \cdot v + u \wedge v$). This operation ensures algebraic completeness regarding the Geometric Product by simultaneously capturing feature coherence (via the generalized inner product) and structural variation (via the exterior wedge product). Implemented via an efficient sparse rolling mechanism with \textbf{strict linear complexity $\mathcal{O}(N)$}, our model reveals a surprising emergent property: the geometric interaction is so representationally dense that standard Feed-Forward Networks (FFNs) become redundant. Empirically, CliffordNet establishes a new Pareto frontier: our \textbf{Nano} variant achieves \textbf{76.41\%} accuracy on CIFAR-100 with only \textbf{1.4M} parameters, effectively matching the heavy-weight ResNet-18 (11.2M) with \textbf{$8\times$ fewer parameters}, while our \textbf{Base} variant sets a new SOTA for tiny models at \textbf{78.05\%}. Our results suggest that global understanding can emerge solely from rigorous, algebraically complete local interactions, potentially signaling a shift where \textit{geometry is all you need}. Code is available at https://github.com/ParaMind2025/CAN.

Category
Computer Science:
CV and Pattern Recognition