CardiacCLIP: Video-based CLIP Adaptation for LVEF Prediction in a Few-shot Manner
By: Yao Du, Jiarong Guo, Xiaomeng Li
Potential Business Impact:
Helps doctors measure heart health better from videos.
Echocardiography is a vital non-invasive modality for cardiac assessment, with left ventricular ejection fraction (LVEF) serving as a key indicator of heart function. Existing LVEF estimation methods depend on large-scale annotated video datasets, which are costly and limit adaptability across various clinical settings. Recent vision-language models for echocardiography, such as EchoCLIP, apply image-to-text pretraining but fail to capture crucial temporal dynamics and localized cardiac structures essential for accurate diagnosis. To address these challenges, we propose CardiacCLIP, a video-based framework that enhances LVEF prediction through attention-based frame aggregation and multi-resolution input scaling. Specifically, we introduce MFL (Multi Frame Learning), a novel attention-based mechanism for selectively fusing informative frames, and EchoZoom, a multi-scale feature extraction strategy that refines spatial representations of cardiac structures. As a novel adaptation of CLIP models for few-shot echocardiogram video analysis, our approach significantly improves diagnostic accuracy, reducing MAE by 2.07 on the EchoNet-Dynamic dataset under 1-shot setting. The code is available at https://github.com/xmed-lab/CardiacCLIP.
Similar Papers
Investigating Deep Learning Models for Ejection Fraction Estimation from Echocardiography Videos
CV and Pattern Recognition
Helps doctors quickly measure heart health from videos.
Video CLIP Model for Multi-View Echocardiography Interpretation
CV and Pattern Recognition
Helps doctors understand heart videos better.
Cardiac-CLIP: A Vision-Language Foundation Model for 3D Cardiac CT Images
Image and Video Processing
Helps doctors find heart problems from scans.