Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation
By: Seogkyu Jeon, Kibeom Hong, Hyeran Byun
Potential Business Impact:
Helps computers see objects in new places.
Recent domain generalized semantic segmentation (DGSS) studies have achieved notable improvements by distilling semantic knowledge from Vision-Language Models (VLMs). However, they overlook the semantic misalignment between visual and textual contexts, which arises due to the rigidity of a fixed context prompt learned on a single source domain. To this end, we present a novel domain generalization framework for semantic segmentation, namely Domain-aware Prompt-driven Masked Transformer (DPMFormer). Firstly, we introduce domain-aware prompt learning to facilitate semantic alignment between visual and textual cues. To capture various domain-specific properties with a single source dataset, we propose domain-aware contrastive learning along with the texture perturbation that diversifies the observable domains. Lastly, to establish a framework resilient against diverse environmental changes, we have proposed the domain-robust consistency learning which guides the model to minimize discrepancies of prediction from original and the augmented images. Through experiments and analyses, we demonstrate the superiority of the proposed framework, which establishes a new state-of-the-art on various DGSS benchmarks. The code is available at https://github.com/jone1222/DPMFormer.
Similar Papers
Generalizing Vision-Language Models with Dedicated Prompt Guidance
CV and Pattern Recognition
Helps AI understand new things better.
Constrained Prompt Enhancement for Improving Zero-Shot Generalization of Vision-Language Models
CV and Pattern Recognition
Helps computers understand pictures and words better.
Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection
CV and Pattern Recognition
Teaches computers to see in new places.