Score: 0

Task-Specific Knowledge Distillation from the Vision Foundation Model for Enhanced Medical Image Segmentation

Published: March 10, 2025 | arXiv ID: 2503.06976v1

By: Pengchen Liang , Haishan Huang , Bin Pu and more

Potential Business Impact:

Teaches computers to see diseases in X-rays.

Business Areas:

Image Recognition Data and Analytics, Software

Large-scale pre-trained models, such as Vision Foundation Models (VFMs), have demonstrated impressive performance across various downstream tasks by transferring generalized knowledge, especially when target data is limited. However, their high computational cost and the domain gap between natural and medical images limit their practical application in medical segmentation tasks. Motivated by this, we pose the following important question: "How can we effectively utilize the knowledge of large pre-trained VFMs to train a small, task-specific model for medical image segmentation when training data is limited?" To address this problem, we propose a novel and generalizable task-specific knowledge distillation framework. Our method fine-tunes the VFM on the target segmentation task to capture task-specific features before distilling the knowledge to smaller models, leveraging Low-Rank Adaptation (LoRA) to reduce the computational cost of fine-tuning. Additionally, we incorporate synthetic data generated by diffusion models to augment the transfer set, enhancing model performance in data-limited scenarios. Experimental results across five medical image datasets demonstrate that our method consistently outperforms task-agnostic knowledge distillation and self-supervised pretraining approaches like MoCo v3 and Masked Autoencoders (MAE). For example, on the KidneyUS dataset, our method achieved a 28% higher Dice score than task-agnostic KD using 80 labeled samples for fine-tuning. On the CHAOS dataset, it achieved an 11% improvement over MAE with 100 labeled samples. These results underscore the potential of task-specific knowledge distillation to train accurate, efficient models for medical image segmentation in data-constrained settings.

Agglomerating Large Vision Encoders via Distillation for VFSS Segmentation

CV and Pattern Recognition

Teaches small AI to see like big AI.

3 Apr 2025 0

91%

DistillMatch: Leveraging Knowledge Distillation from Vision Foundation Model for Multimodal Image Matching

CV and Pattern Recognition

Helps computers match pictures from different cameras.

19 Sep 2025 0

90%

Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning

Image and Video Processing

Helps doctors find diseases in scans faster.

13 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

29 pages

Task-Specific Knowledge Distillation from the Vision Foundation Model for Enhanced Medical Image Segmentation

Teaches computers to see diseases in X-rays.

Technical Abstract

Agglomerating Large Vision Encoders via Distillation for VFSS Segmentation

DistillMatch: Leveraging Knowledge Distillation from Vision Foundation Model for Multimodal Image Matching

Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning