Can Large Language Models Challenge CNNs in Medical Image Analysis?
By: Shibbir Ahmed, Shahnewaz Karim Sakib, Anindya Bijoy Das
Potential Business Impact:
Helps doctors find sickness faster with smart computers.
This study presents a multimodal AI framework designed for precisely classifying medical diagnostic images. Utilizing publicly available datasets, the proposed system compares the strengths of convolutional neural networks (CNNs) and different large language models (LLMs). This in-depth comparative analysis highlights key differences in diagnostic performance, execution efficiency, and environmental impacts. Model evaluation was based on accuracy, F1-score, average execution time, average energy consumption, and estimated $CO_2$ emission. The findings indicate that although CNN-based models can outperform various multimodal techniques that incorporate both images and contextual information, applying additional filtering on top of LLMs can lead to substantial performance gains. These findings highlight the transformative potential of multimodal AI systems to enhance the reliability, efficiency, and scalability of medical diagnostics in clinical settings.
Similar Papers
A Comparison and Evaluation of Fine-tuned Convolutional Neural Networks to Large Language Models for Image Classification and Segmentation of Brain Tumors on MRI
CV and Pattern Recognition
Computers can't yet read brain scans well.
Performance of Large Language Models in Supporting Medical Diagnosis and Treatment
Computation and Language
AI helps doctors diagnose illnesses and plan treatments.
Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis
CV and Pattern Recognition
Helps doctors understand cancer treatment images better.