Memory-Augmented SAM2 for Training-Free Surgical Video Segmentation
By: Ming Yin , Fu Wang , Xujiong Ye and more
Potential Business Impact:
Helps robots see and track tools in surgery.
Surgical video segmentation is a critical task in computer-assisted surgery, essential for enhancing surgical quality and patient outcomes. Recently, the Segment Anything Model 2 (SAM2) framework has demonstrated remarkable advancements in both image and video segmentation. However, the inherent limitations of SAM2's greedy selection memory design are amplified by the unique properties of surgical videos-rapid instrument movement, frequent occlusion, and complex instrument-tissue interaction-resulting in diminished performance in the segmentation of complex, long videos. To address these challenges, we introduce Memory Augmented (MA)-SAM2, a training-free video object segmentation strategy, featuring novel context-aware and occlusion-resilient memory models. MA-SAM2 exhibits strong robustness against occlusions and interactions arising from complex instrument movements while maintaining accuracy in segmenting objects throughout videos. Employing a multi-target, single-loop, one-prompt inference further enhances the efficiency of the tracking process in multi-instrument videos. Without introducing any additional parameters or requiring further training, MA-SAM2 achieved performance improvements of 4.36% and 6.1% over SAM2 on the EndoVis2017 and EndoVis2018 datasets, respectively, demonstrating its potential for practical surgical applications.
Similar Papers
SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking
CV and Pattern Recognition
Helps surgeons see and track tools during operations.
MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection
CV and Pattern Recognition
Helps computers track moving things in videos.
Evaluating SAM2 for Video Semantic Segmentation
CV and Pattern Recognition
Lets computers perfectly cut out any object in videos.