Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection
By: Benjamin A. Cohen , Jonathan Fhima , Meishar Meisel and more
Potential Business Impact:
Helps doctors spot eye disease from pictures.
Self-supervised learning (SSL) has enabled Vision Transformers (ViTs) to learn robust representations from large-scale natural image datasets, enhancing their generalization across domains. In retinal imaging, foundation models pretrained on either natural or ophthalmic data have shown promise, but the benefits of in-domain pretraining remain uncertain. To investigate this, we benchmark six SSL-pretrained ViTs on seven digital fundus image (DFI) datasets totaling 70,000 expert-annotated images for the task of moderate-to-late age-related macular degeneration (AMD) identification. Our results show that iBOT pretrained on natural images achieves the highest out-of-distribution generalization, with AUROCs of 0.80-0.97, outperforming domain-specific models, which achieved AUROCs of 0.78-0.96 and a baseline ViT-L with no pretraining, which achieved AUROCs of 0.68-0.91. These findings highlight the value of foundation models in improving AMD identification and challenge the assumption that in-domain pretraining is necessary. Furthermore, we release BRAMD, an open-access dataset (n=587) of DFIs with AMD labels from Brazil.
Similar Papers
When Do Domain-Specific Foundation Models Justify Their Cost? A Systematic Evaluation Across Retinal Imaging Tasks
Image and Video Processing
Smaller computer models see eye diseases better.
Functional Localization Enforced Deep Anomaly Detection Using Fundus Images
CV and Pattern Recognition
Finds eye diseases in pictures better.
Generalist versus Specialist Vision Foundation Models for Ocular Disease and Oculomics
Image and Video Processing
Helps doctors find eye diseases better.