Open Vocabulary Panoptic Segmentation With Retrieval Augmentation
By: Nafis Sadeq, Qingfeng Liu, Mostafa El-Khamy
Potential Business Impact:
Lets computers see any object, even new ones.
Given an input image and set of class names, panoptic segmentation aims to label each pixel in an image with class labels and instance labels. In comparison, Open Vocabulary Panoptic Segmentation aims to facilitate the segmentation of arbitrary classes according to user input. The challenge is that a panoptic segmentation system trained on a particular dataset typically does not generalize well to unseen classes beyond the training data. In this work, we propose RetCLIP, a retrieval-augmented panoptic segmentation method that improves the performance of unseen classes. In particular, we construct a masked segment feature database using paired image-text data. At inference time, we use masked segment features from the input image as query keys to retrieve similar features and associated class labels from the database. Classification scores for the masked segment are assigned based on the similarity between query features and retrieved features. The retrieval-based classification scores are combined with CLIP-based scores to produce the final output. We incorporate our solution with a previous SOTA method (FC-CLIP). When trained on COCO, the proposed method demonstrates 30.9 PQ, 19.3 mAP, 44.0 mIoU on the ADE20k dataset, achieving +4.5 PQ, +2.5 mAP, +10.0 mIoU absolute improvement over the baseline.
Similar Papers
Language-Guided Open-World Anomaly Segmentation
CV and Pattern Recognition
Names unknown things for self-driving cars.
Improving Visual Discriminability of CLIP for Training-Free Open-Vocabulary Semantic Segmentation
CV and Pattern Recognition
Makes computers understand pictures better for tasks.
SuperCLIP: CLIP with Simple Classification Supervision
CV and Pattern Recognition
Makes computers understand pictures and words better.