Evaluating Fundus-Specific Foundation Models for Diabetic Macular Edema Detection
By: Franco Javier Arellano, José Ignacio Orlando
Potential Business Impact:
Helps doctors spot eye problems from pictures.
Diabetic Macular Edema (DME) is a leading cause of vision loss among patients with Diabetic Retinopathy (DR). While deep learning has shown promising results for automatically detecting this condition from fundus images, its application remains challenging due the limited availability of annotated data. Foundation Models (FM) have emerged as an alternative solution. However, it is unclear if they can cope with DME detection in particular. In this paper, we systematically compare different FM and standard transfer learning approaches for this task. Specifically, we compare the two most popular FM for retinal images--RETFound and FLAIR--and an EfficientNet-B0 backbone, across different training regimes and evaluation settings in IDRiD, MESSIDOR-2 and OCT-and-Eye-Fundus-Images (OEFI). Results show that despite their scale, FM do not consistently outperform fine-tuned CNNs in this task. In particular, an EfficientNet-B0 ranked first or second in terms of area under the ROC and precision/recall curves in most evaluation settings, with RETFound only showing promising results in OEFI. FLAIR, on the other hand, demonstrated competitive zero-shot performance, achieving notable AUC-PR scores when prompted appropriately. These findings reveal that FM might not be a good tool for fine-grained ophthalmic tasks such as DME detection even after fine-tuning, suggesting that lightweight CNNs remain strong baselines in data-scarce environments.
Similar Papers
FusionFM: Fusing Eye-specific Foundational Models for Optimized Ophthalmic Diagnosis
CV and Pattern Recognition
Helps doctors find eye and body diseases from eye pictures.
When Do Domain-Specific Foundation Models Justify Their Cost? A Systematic Evaluation Across Retinal Imaging Tasks
Image and Video Processing
Smaller computer models see eye diseases better.
An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care
Image and Video Processing
Helps doctors find eye problems using AI.