SAMannot: A Memory-Efficient, Local, Open-source Framework for Interactive Video Instance Segmentation based on SAM2
By: Gergely Dinya , András Gelencsér , Krisztina Kupán and more
Potential Business Impact:
Makes video labeling faster and private.
Current research workflows for precise video segmentation are often forced into a compromise between labor-intensive manual curation, costly commercial platforms, and/or privacy-compromising cloud-based services. The demand for high-fidelity video instance segmentation in research is often hindered by the bottleneck of manual annotation and the privacy concerns of cloud-based tools. We present SAMannot, an open-source, local framework that integrates the Segment Anything Model 2 (SAM2) into a human-in-the-loop workflow. To address the high resource requirements of foundation models, we modified the SAM2 dependency and implemented a processing layer that minimizes computational overhead and maximizes throughput, ensuring a highly responsive user interface. Key features include persistent instance identity management, an automated ``lock-and-refine'' workflow with barrier frames, and a mask-skeletonization-based auto-prompting mechanism. SAMannot facilitates the generation of research-ready datasets in YOLO and PNG formats alongside structured interaction logs. Verified through animal behavior tracking use-cases and subsets of the LVOS and DAVIS benchmark datasets, the tool provides a scalable, private, and cost-effective alternative to commercial platforms for complex video annotation tasks.
Similar Papers
Evaluating SAM2 for Video Semantic Segmentation
CV and Pattern Recognition
Lets computers perfectly cut out any object in videos.
SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking
CV and Pattern Recognition
Helps surgeons see and track tools during operations.
Memory-Augmented SAM2 for Training-Free Surgical Video Segmentation
CV and Pattern Recognition
Helps robots see and track tools in surgery.