Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography
By: Yusdivia Molina-Román , David Gómez-Ortiz , Ernestina Menasalvas-Ruiz and more
Potential Business Impact:
Helps doctors find breast cancer faster.
Mammographic breast density classification is essential for cancer risk assessment but remains challenging due to subjective interpretation and inter-observer variability. This study compares multimodal and CNN-based methods for automated classification using the BI-RADS system, evaluating BioMedCLIP and ConvNeXt across three learning scenarios: zero-shot classification, linear probing with textual descriptions, and fine-tuning with numerical labels. Results show that zero-shot classification achieved modest performance, while the fine-tuned ConvNeXt model outperformed the BioMedCLIP linear probe. Although linear probing demonstrated potential with pretrained embeddings, it was less effective than full fine-tuning. These findings suggest that despite the promise of multimodal learning, CNN-based models with end-to-end fine-tuning provide stronger performance for specialized medical imaging. The study underscores the need for more detailed textual representations and domain-specific adaptations in future radiology applications.
Similar Papers
Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models
CV and Pattern Recognition
Helps doctors find breast cancer earlier and better.
MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction
CV and Pattern Recognition
Helps doctors find breast cancer faster.
Proof of Concept for Mammography Classification with Enhanced Compactness and Separability Modules
Image and Video Processing
Helps doctors spot breast cancer better on scans.