ChangeCLIP: Remote sensing change detection with multimodal vision-language representation learning

journal 2024  ·  Sijun Dong, Libo Wang, Bo Du, Xiaoliang Meng ·

Remote sensing change detection (RSCD), which aims to identify surface changes from bitemporal images, is significant for many applications, such as environmental protection and disaster monitoring. In the last decade, driven by the wave of artificial intelligence, many change detection methods based on deep learning emerged and have achieved essential breakthroughs. However, these methods pay more attention to visual representation learning while ignoring the potential of multimodal data. Recently, the foundation vision-language model, i.e. CLIP, has provided a new paradigm for multimodal AI, demonstrating impressive performance on downstream tasks. Following this trend, in this study, we introduce ChangeCLIP, a novel framework that leverages robust semantic information from image-text pairs, specifically tailored for Remote Sensing Change Detection (RSCD). Specifically, we reconstruct the original CLIP to extract bitemporal features and propose a novel differential features compensation module to capture the detailed semantic changes between them. Besides, we proposed a vision-language-driven decoder by combining the results of image-text encoding with the visual features of the decoding stage, thereby enhancing the image semantics. The proposed ChangeCLIP achieved state-of-the-art IoU on 5 well-known change detection datasets, LEVIR-CD (85.20%), LEVIR-CD+ (75.63%), WHUCD (90.15%), CDD (95.87%) and SYSU-CD (71.41%). The code and the pretrained models of ChangeCLIP will be publicly available on https://github.com/dyzy41/ChangeCLIP.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Change Detection CDD Dataset (season-varying) ChangeCLIP F1-Score 97.89 # 4
Precision 98.02 # 1
F1 97.89 # 1
Overall Accuracy 99.48 # 1
IoU 95.87 # 1
Recall 97.77 # 1
Change Detection LEVIR-CD ChangeCLIP F1 92.01 # 7
IoU 85.20 # 6
Overall Accuracy 99.20 # 1
Precision 93.40 # 1
Recall 90.67 # 4

Methods