Distribution-Based Masked Medical Vision-Language Model Using Structured Reports
By: Shreyank N Gowda , Ruichi Zhang , Xiao Gu and more
Potential Business Impact:
Helps doctors understand X-rays better by reading reports.
Medical image-language pre-training aims to align medical images with clinically relevant text to improve model performance on various downstream tasks. However, existing models often struggle with the variability and ambiguity inherent in medical data, limiting their ability to capture nuanced clinical information and uncertainty. This work introduces an uncertainty-aware medical image-text pre-training model that enhances generalization capabilities in medical image analysis. Building on previous methods and focusing on Chest X-Rays, our approach utilizes structured text reports generated by a large language model (LLM) to augment image data with clinically relevant context. These reports begin with a definition of the disease, followed by the `appearance' section to highlight critical regions of interest, and finally `observations' and `verdicts' that ground model predictions in clinical semantics. By modeling both inter- and intra-modal uncertainty, our framework captures the inherent ambiguity in medical images and text, yielding improved representations and performance on downstream tasks. Our model demonstrates significant advances in medical image-text pre-training, obtaining state-of-the-art performance on multiple downstream tasks.
Similar Papers
More performant and scalable: Rethinking contrastive vision-language pre-training of radiology in the LLM era
CV and Pattern Recognition
AI reads X-rays and reports for better medical AI.
VELVET-Med: Vision and Efficient Language Pre-training for Volumetric Imaging Tasks in Medicine
CV and Pattern Recognition
Helps doctors understand 3D scans better.
Comprehensive language-image pre-training for 3D medical image understanding
CV and Pattern Recognition
Helps doctors find sickness in scans.