Revolutionizing Precise Low Back Pain Diagnosis via Contrastive Learning
By: Thanh Binh Le , Hoang Nhat Khang Vo , Tan-Ha Mai and more
Potential Business Impact:
Helps doctors find back pain from scans and words.
Low back pain affects millions worldwide, driving the need for robust diagnostic models that can jointly analyze complex medical images and accompanying text reports. We present LumbarCLIP, a novel multimodal framework that leverages contrastive language-image pretraining to align lumbar spine MRI scans with corresponding radiological descriptions. Built upon a curated dataset containing axial MRI views paired with expert-written reports, LumbarCLIP integrates vision encoders (ResNet-50, Vision Transformer, Swin Transformer) with a BERT-based text encoder to extract dense representations. These are projected into a shared embedding space via learnable projection heads, configurable as linear or non-linear, and normalized to facilitate stable contrastive training using a soft CLIP loss. Our model achieves state-of-the-art performance on downstream classification, reaching up to 95.00% accuracy and 94.75% F1-score on the test set, despite inherent class imbalance. Extensive ablation studies demonstrate that linear projection heads yield more effective cross-modal alignment than non-linear variants. LumbarCLIP offers a promising foundation for automated musculoskeletal diagnosis and clinical decision support.
Similar Papers
A Vision-Language Model for Focal Liver Lesion Classification
CV and Pattern Recognition
Helps doctors find liver problems with less pictures.
QwenCLIP: Boosting Medical Vision-Language Pretraining via LLM Embeddings and Prompt tuning
CV and Pattern Recognition
Helps doctors understand long patient notes better.
uCLIP: Parameter-Efficient Multilingual Extension of Vision-Language Models with Unpaired Data
CV and Pattern Recognition
Helps computers understand pictures in many languages.