OMUDA: Omni-level Masking for Unsupervised Domain Adaptation in Semantic Segmentation
By: Yang Ou , Xiongwei Zhao , Xinye Yang and more
Potential Business Impact:
Helps computers see in new places without new labels.
Unsupervised domain adaptation (UDA) enables semantic segmentation models to generalize from a labeled source domain to an unlabeled target domain. However, existing UDA methods still struggle to bridge the domain gap due to cross-domain contextual ambiguity, inconsistent feature representations, and class-wise pseudo-label noise. To address these challenges, we propose Omni-level Masking for Unsupervised Domain Adaptation (OMUDA), a unified framework that introduces hierarchical masking strategies across distinct representation levels. Specifically, OMUDA comprises: 1) a Context-Aware Masking (CAM) strategy that adaptively distinguishes foreground from background to balance global context and local details; 2) a Feature Distillation Masking (FDM) strategy that enhances robust and consistent feature learning through knowledge transfer from pre-trained models; and 3) a Class Decoupling Masking (CDM) strategy that mitigates the impact of noisy pseudo-labels by explicitly modeling class-wise uncertainty. This hierarchical masking paradigm effectively reduces the domain shift at the contextual, representational, and categorical levels, providing a unified solution beyond existing approaches. Extensive experiments on multiple challenging cross-domain semantic segmentation benchmarks validate the effectiveness of OMUDA. Notably, on the SYNTHIA->Cityscapes and GTA5->Cityscapes tasks, OMUDA can be seamlessly integrated into existing UDA methods and consistently achieving state-of-the-art results with an average improvement of 7%.
Similar Papers
Masked Feature Modeling Enhances Adaptive Segmentation
CV and Pattern Recognition
Teaches computers to see in new places.
DUDA: Distilled Unsupervised Domain Adaptation for Lightweight Semantic Segmentation
CV and Pattern Recognition
Helps small computer programs learn like big ones.
Unified modality separation: A vision-language framework for unsupervised domain adaptation
CV and Pattern Recognition
Helps computers learn from pictures and words better.