LangDA: Building Context-Awareness via Language for Domain Adaptive Semantic Segmentation
By: Chang Liu , Bavesh Balaji , Saad Hossain and more
Potential Business Impact:
Teaches computers to understand pictures without labels.
Unsupervised domain adaptation for semantic segmentation (DASS) aims to transfer knowledge from a label-rich source domain to a target domain with no labels. Two key approaches in DASS are (1) vision-only approaches using masking or multi-resolution crops, and (2) language-based approaches that use generic class-wise prompts informed by target domain (e.g. "a {snowy} photo of a {class}"). However, the former is susceptible to noisy pseudo-labels that are biased to the source domain. The latter does not fully capture the intricate spatial relationships of objects -- key for dense prediction tasks. To this end, we propose LangDA. LangDA addresses these challenges by, first, learning contextual relationships between objects via VLM-generated scene descriptions (e.g. "a pedestrian is on the sidewalk, and the street is lined with buildings."). Second, LangDA aligns the entire image features with text representation of this context-aware scene caption and learns generalized representations via text. With this, LangDA sets the new state-of-the-art across three DASS benchmarks, outperforming existing methods by 2.6%, 1.4% and 3.9%.
Similar Papers
SemiDAViL: Semi-supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation
CV and Pattern Recognition
Helps computers understand pictures better with words.
Balanced Learning for Domain Adaptive Semantic Segmentation
CV and Pattern Recognition
Helps computers better understand pictures of things.
Domain Adaptation for Image Classification of Defects in Semiconductor Manufacturing
CV and Pattern Recognition
Finds tiny flaws in computer chips faster.