SAM-pose2seg: Pose-Guided Human Instance Segmentation in Crowds
By: Constantin Kolomiiets, Miroslav Purkrabek, Jiri Matas
Segment Anything (SAM) provides an unprecedented foundation for human segmentation, but may struggle under occlusion, where keypoints may be partially or fully invisible. We adapt SAM 2.1 for pose-guided segmentation with minimal encoder modifications, retaining its strong generalization. Using a fine-tuning strategy called PoseMaskRefine, we incorporate pose keypoints with high visibility into the iterative correction process originally employed by SAM, yielding improved robustness and accuracy across multiple datasets. During inference, we simplify prompting by selecting only the three keypoints with the highest visibility. This strategy reduces sensitivity to common errors, such as missing body parts or misclassified clothing, and allows accurate mask prediction from as few as a single keypoint. Our results demonstrate that pose-guided fine-tuning of SAM enables effective, occlusion-aware human segmentation while preserving the generalization capabilities of the original model. The code and pretrained models will be available at https://mirapurkrabek.github.io/BBox-MaskPose.
Similar Papers
SAM 3: Segment Anything with Concepts
CV and Pattern Recognition
Finds and tracks any object you describe.
Evaluating SAM2 for Video Semantic Segmentation
CV and Pattern Recognition
Lets computers perfectly cut out any object in videos.
3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation
CV and Pattern Recognition
Helps dentists perfectly map out teeth in 3D.