Retrieval-Augmented VLMs for Multimodal Melanoma Diagnosis
By: Jihyun Moon, Charmgil Hong
Potential Business Impact:
Helps doctors spot skin cancer faster and better.
Accurate and early diagnosis of malignant melanoma is critical for improving patient outcomes. While convolutional neural networks (CNNs) have shown promise in dermoscopic image analysis, they often neglect clinical metadata and require extensive preprocessing. Vision-language models (VLMs) offer a multimodal alternative but struggle to capture clinical specificity when trained on general-domain data. To address this, we propose a retrieval-augmented VLM framework that incorporates semantically similar patient cases into the diagnostic prompt. Our method enables informed predictions without fine-tuning and significantly improves classification accuracy and error correction over conventional baselines. These results demonstrate that retrieval-augmented prompting provides a robust strategy for clinical decision support.
Similar Papers
Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models
CV and Pattern Recognition
Helps doctors find breast cancer earlier and better.
MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction
CV and Pattern Recognition
Helps doctors find breast cancer faster.
Image Recognition with Vision and Language Embeddings of VLMs
CV and Pattern Recognition
Helps computers understand pictures better with words or just sight.