Score: 0

Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision

Published: December 5, 2025 | arXiv ID: 2512.05740v1

By: Lennart Maack , Julia-Kristin Graß , Lisa-Marie Toscha and more

Potential Business Impact:

Helps surgeons see and understand operations better.

Business Areas:

Image Recognition Data and Analytics, Software

Recently, Vision Large Language Models (VLMs) have demonstrated high potential in computer-aided diagnosis and decision-support. However, current VLMs show deficits in domain specific surgical scene understanding, such as identifying and explaining anatomical landmarks during Complete Mesocolic Excision. Additionally, there is a need for locally deployable models to avoid patient data leakage to large VLMs, hosted outside the clinic. We propose a privacy-preserving framework to distill knowledge from large, general-purpose LLMs into an efficient, local VLM. We generate an expert-supervised dataset by prompting a teacher LLM without sensitive images, using only textual context and binary segmentation masks for spatial information. This dataset is used for Supervised Fine-Tuning (SFT) and subsequent Direct Preference Optimization (DPO) of the locally deployable VLM. Our evaluation confirms that finetuning VLMs with our generated datasets increases surgical domain knowledge compared to its base VLM by a large margin. Overall, this work validates a data-efficient and privacy-conforming way to train a surgical domain optimized, locally deployable VLM for surgical scene understanding.

Challenging Vision-Language Models with Surgical Data: A New Dataset and Broad Benchmarking Study

CV and Pattern Recognition

Helps doctors see better during surgery.

6 Jun 2025 1

90%

Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence

CV and Pattern Recognition

AI helps doctors understand surgery better.

3 Apr 2025 1

89%

SurgXBench: Explainable Vision-Language Model Benchmark for Surgery

CV and Pattern Recognition

Helps robot surgeons see and understand actions.

16 May 2025 0

View PDF Login to Bookmark

Country of Origin

🇩🇪 Germany

Page Count

4 pages

Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision

Helps surgeons see and understand operations better.

Technical Abstract

Challenging Vision-Language Models with Surgical Data: A New Dataset and Broad Benchmarking Study

Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence

SurgXBench: Explainable Vision-Language Model Benchmark for Surgery