Score: 0

CNN and ViT Efficiency Study on Tiny ImageNet and DermaMNIST Datasets

Published: May 13, 2025 | arXiv ID: 2505.08259v1

By: Aidar Amangeldi , Angsar Taigonyrov , Muhammad Huzaid Jawad and more

Potential Business Impact:

Makes AI see pictures faster and with less power.

Business Areas:
Image Recognition Data and Analytics, Software

This study evaluates the trade-offs between convolutional and transformer-based architectures on both medical and general-purpose image classification benchmarks. We use ResNet-18 as our baseline and introduce a fine-tuning strategy applied to four Vision Transformer variants (Tiny, Small, Base, Large) on DermatologyMNIST and TinyImageNet. Our goal is to reduce inference latency and model complexity with acceptable accuracy degradation. Through systematic hyperparameter variations, we demonstrate that appropriately fine-tuned Vision Transformers can match or exceed the baseline's performance, achieve faster inference, and operate with fewer parameters, highlighting their viability for deployment in resource-constrained environments.

Country of Origin
🇰🇿 Kazakhstan

Page Count
9 pages

Category
Computer Science:
CV and Pattern Recognition