CNN and ViT Efficiency Study on Tiny ImageNet and DermaMNIST Datasets
By: Aidar Amangeldi , Angsar Taigonyrov , Muhammad Huzaid Jawad and more
Potential Business Impact:
Makes AI see pictures faster and with less power.
This study evaluates the trade-offs between convolutional and transformer-based architectures on both medical and general-purpose image classification benchmarks. We use ResNet-18 as our baseline and introduce a fine-tuning strategy applied to four Vision Transformer variants (Tiny, Small, Base, Large) on DermatologyMNIST and TinyImageNet. Our goal is to reduce inference latency and model complexity with acceptable accuracy degradation. Through systematic hyperparameter variations, we demonstrate that appropriately fine-tuned Vision Transformers can match or exceed the baseline's performance, achieve faster inference, and operate with fewer parameters, highlighting their viability for deployment in resource-constrained environments.
Similar Papers
A Comparative Study of Vision Transformers and CNNs for Few-Shot Rigid Transformation and Fundamental Matrix Estimation
CV and Pattern Recognition
Helps computers understand images with less data.
Comparative Analysis of Lightweight Deep Learning Models for Memory-Constrained Devices
CV and Pattern Recognition
Makes smart computer vision work on small phones.
Hybrid Convolution and Vision Transformer NAS Search Space for TinyML Image Classification
CV and Pattern Recognition
Makes tiny computers recognize pictures faster.