ViKANformer: Embedding Kolmogorov Arnold Networks in Vision Transformers for Pattern-Based Learning
By: Shreyas S, Akshath M
Potential Business Impact:
Makes computer vision smarter by learning better.
Vision Transformers (ViTs) have significantly advanced image classification by applying self-attention on patch embeddings. However, the standard MLP blocks in each Transformer layer may not capture complex nonlinear dependencies optimally. In this paper, we propose ViKANformer, a Vision Transformer where we replace the MLP sub-layers with Kolmogorov-Arnold Network (KAN) expansions, including Vanilla KAN, Efficient-KAN, Fast-KAN, SineKAN, and FourierKAN, while also examining a Flash Attention variant. By leveraging the Kolmogorov-Arnold theorem, which guarantees that multivariate continuous functions can be expressed via sums of univariate continuous functions, we aim to boost representational power. Experimental results on MNIST demonstrate that SineKAN, Fast-KAN, and a well-tuned Vanilla KAN can achieve over 97% accuracy, albeit with increased training overhead. This trade-off highlights that KAN expansions may be beneficial if computational cost is acceptable. We detail the expansions, present training/test accuracy and F1/ROC metrics, and provide pseudocode and hyperparameters for reproducibility. Finally, we compare ViKANformer to a simple MLP and a small CNN baseline on MNIST, illustrating the efficiency of Transformer-based methods even on a small-scale dataset.
Similar Papers
Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
CV and Pattern Recognition
Makes computer vision models smarter and faster.
Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?
Machine Learning (CS)
Makes AI understand pictures better by learning attention.
Enhancing Burmese News Classification with Kolmogorov-Arnold Network Head Fine-tuning
Computation and Language
Helps computers understand languages with fewer examples.