A Vision-Language Model for Focal Liver Lesion Classification
By: Song Jian , Hu Yuchang , Wang Hui and more
Potential Business Impact:
Helps doctors find liver problems with less pictures.
Accurate classification of focal liver lesions is crucial for diagnosis and treatment in hepatology. However, traditional supervised deep learning models depend on large-scale annotated datasets, which are often limited in medical imaging. Recently, Vision-Language models (VLMs) such as Contrastive Language-Image Pre-training model (CLIP) has been applied to image classifications. Compared to the conventional convolutional neural network (CNN), which classifiers image based on visual information only, VLM leverages multimodal learning with text and images, allowing it to learn effectively even with a limited amount of labeled data. Inspired by CLIP, we pro-pose a Liver-VLM, a model specifically designed for focal liver lesions (FLLs) classification. First, Liver-VLM incorporates class information into the text encoder without introducing additional inference overhead. Second, by calculating the pairwise cosine similarities between image and text embeddings and optimizing the model with a cross-entropy loss, Liver-VLM ef-fectively aligns image features with class-level text features. Experimental results on MPCT-FLLs dataset demonstrate that the Liver-VLM model out-performs both the standard CLIP and MedCLIP models in terms of accuracy and Area Under the Curve (AUC). Further analysis shows that using a lightweight ResNet18 backbone enhances classification performance, particularly under data-constrained conditions.
Similar Papers
Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding
CV and Pattern Recognition
Helps doctors find diseases in X-rays better.
Semantic-Clipping: Efficient Vision-Language Modeling with Semantic-Guidedd Visual Selection
CV and Pattern Recognition
Helps computers understand pictures better by focusing on important parts.
Image Recognition with Vision and Language Embeddings of VLMs
CV and Pattern Recognition
Helps computers understand pictures better with words or just sight.