UniChange: Unifying Change Detection with Multimodal Large Language Model
By: Xu Zhang , Danyang Li , Xiaohang Dong and more
Potential Business Impact:
Lets computers see changes in pictures better.
Change detection (CD) is a fundamental task for monitoring and analyzing land cover dynamics. While recent high performance models and high quality datasets have significantly advanced the field, a critical limitation persists. Current models typically acquire limited knowledge from single-type annotated data and cannot concurrently leverage diverse binary change detection (BCD) and semantic change detection (SCD) datasets. This constraint leads to poor generalization and limited versatility. The recent advancements in Multimodal Large Language Models (MLLMs) introduce new possibilities for a unified CD framework. We leverage the language priors and unification capabilities of MLLMs to develop UniChange, the first MLLM-based unified change detection model. UniChange integrates generative language abilities with specialized CD functionalities. Our model successfully unifies both BCD and SCD tasks through the introduction of three special tokens: [T1], [T2], and [CHANGE]. Furthermore, UniChange utilizes text prompts to guide the identification of change categories, eliminating the reliance on predefined classification heads. This design allows UniChange to effectively acquire knowledge from multi-source datasets, even when their class definitions conflict. Experiments on four public benchmarks (WHU-CD, S2Looking, LEVIR-CD+, and SECOND) demonstrate SOTA performance, achieving IoU scores of 90.41, 53.04, 78.87, and 57.62, respectively, surpassing all previous methods. The code is available at https://github.com/Erxucomeon/UniChange.
Similar Papers
UniVCD: A New Method for Unsupervised Change Detection in the Open-Vocabulary Era
CV and Pattern Recognition
Finds changes in pictures without needing labels.
Referring Change Detection in Remote Sensing Imagery
CV and Pattern Recognition
Finds specific changes in pictures using words.
Multimodal Feature Fusion Network with Text Difference Enhancement for Remote Sensing Change Detection
CV and Pattern Recognition
Finds changes in pictures using words too.