SegDINO: An Efficient Design for Medical and Natural Image Segmentation with DINO-V3
By: Sicheng Yang , Hongqiu Wang , Zhaohu Xing and more
Potential Business Impact:
Makes computers better at finding objects in pictures.
The DINO family of self-supervised vision models has shown remarkable transferability, yet effectively adapting their representations for segmentation remains challenging. Existing approaches often rely on heavy decoders with multi-scale fusion or complex upsampling, which introduce substantial parameter overhead and computational cost. In this work, we propose SegDINO, an efficient segmentation framework that couples a frozen DINOv3 backbone with a lightweight decoder. SegDINO extracts multi-level features from the pretrained encoder, aligns them to a common resolution and channel width, and utilizes a lightweight MLP head to directly predict segmentation masks. This design minimizes trainable parameters while preserving the representational power of foundation features. Extensive experiments across six benchmarks, including three medical datasets (TN3K, Kvasir-SEG, ISIC) and three natural image datasets (MSD, VMD-D, ViSha), demonstrate that SegDINO consistently achieves state-of-the-art performance compared to existing methods. Code is available at https://github.com/script-Yang/SegDINO.
Similar Papers
SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features
CV and Pattern Recognition
Helps computers understand 3D shapes using 2D pictures.
Dino U-Net: Exploiting High-Fidelity Dense Features from Foundation Models for Medical Image Segmentation
CV and Pattern Recognition
Helps doctors see inside bodies better.
DINOv2-powered Few-Shot Semantic Segmentation: A Unified Framework via Cross-Model Distillation and 4D Correlation Mining
CV and Pattern Recognition
Teaches computers to recognize new things with few examples.