Learning Anatomy from Multiple Perspectives via Self-supervision in Chest Radiographs
By: Ziyu Zhou , Haozhe Luo , Mohammad Reza Hosseinzadeh Taher and more
Potential Business Impact:
Teaches computers to understand body pictures.
Foundation models have been successful in natural language processing and computer vision because they are capable of capturing the underlying structures (foundation) of natural languages. However, in medical imaging, the key foundation lies in human anatomy, as these images directly represent the internal structures of the body, reflecting the consistency, coherence, and hierarchy of human anatomy. Yet, existing self-supervised learning (SSL) methods often overlook these perspectives, limiting their ability to effectively learn anatomical features. To overcome the limitation, we built Lamps (learning anatomy from multiple perspectives via self-supervision) pre-trained on large-scale chest radiographs by harmoniously utilizing the consistency, coherence, and hierarchy of human anatomy as the supervision signal. Extensive experiments across 10 datasets evaluated through fine-tuning and emergent property analysis demonstrate Lamps' superior robustness, transferability, and clinical potential when compared to 10 baseline models. By learning from multiple perspectives, Lamps presents a unique opportunity for foundation models to develop meaningful, robust representations that are aligned with the structure of human anatomy.
Similar Papers
Multi Anatomy X-Ray Foundation Model
CV and Pattern Recognition
AI reads X-rays of any body part.
Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis
CV and Pattern Recognition
Helps doctors see hidden problems in X-rays.
Bridging Brain with Foundation Models through Self-Supervised Learning
Machine Learning (CS)
Lets computers understand brain signals without labels.