Comparative Analysis of Custom CNN Architectures versus Pre-trained Models and Transfer Learning: A Study on Five Bangladesh Datasets
By: Ibrahim Tanvir, Alif Ruslan, Sartaj Solaiman
Potential Business Impact:
Teaches computers to spot road damage perfectly.
This study presents a comprehensive comparative analysis of custom-built Convolutional Neural Networks (CNNs) against popular pre-trained architectures (ResNet-18 and VGG-16) using both feature extraction and transfer learning approaches. We evaluated these models across five diverse image classification datasets from Bangladesh: Footpath Vision, Auto Rickshaw Detection, Mango Image Classification, Paddy Variety Recognition, and Road Damage Detection. Our experimental results demonstrate that transfer learning with fine-tuning consistently outperforms both custom CNNs built from scratch and feature extraction methods, achieving accuracy improvements ranging from 3% to 76% across different datasets. Notably, ResNet-18 with fine-tuning achieved perfect 100% accuracy on the Road Damage BD dataset. While custom CNNs offer advantages in model size (3.4M parameters vs. 11-134M for pre-trained models) and training efficiency on simpler tasks, pre-trained models with transfer learning provide superior performance, particularly on complex classification tasks with limited training data. This research provides practical insights for practitioners in selecting appropriate deep learning approaches based on dataset characteristics, computational resources, and performance requirements.
Similar Papers
Performance Analysis of Image Classification on Bangladeshi Datasets
CV and Pattern Recognition
Makes computers better at recognizing pictures.
A Comparative Study of Custom CNNs, Pre-trained Models, and Transfer Learning Across Multiple Visual Datasets
CV and Pattern Recognition
Transfer learning finds best image matches.
Evolving CNN Architectures: From Custom Designs to Deep Residual Models for Diverse Image Classification and Detection Tasks
CV and Pattern Recognition
Builds smarter computer vision for different tasks.