Referring Change Detection in Remote Sensing Imagery
WACV 2026

Yilmaz Korkmaz¹

Jay N. Paranjape¹

Celso M. de Melo²

Vishal M. Patel¹

¹ Johns Hopkins University, Baltimore, USA ² DEVCOM U.S. Army Research Laboratory, Adelphi, USA

{ykorkma1, jparanj2, vpatel36}@jhu.edu celso.m.demelo.civ@army.mil

[Paper]

[Supplementary]

[GitHub]

[🤗 RCDGen Weights]

[🤗 SECOND Synthetic]

[🤗 CNAM-CD Synthetic]

Abstract

Change detection in remote sensing imagery is essential for applications such as urban planning, environmental monitoring, and disaster management. Traditional change detection methods typically identify all changes between two temporal images without distinguishing the types of transitions, which can lead to results that may not align with specific user needs. Although semantic change detection methods have attempted to address this by categorizing changes into predefined classes, these methods rely on rigid class definitions and fixed model architectures, making it difficult to mix datasets with different label sets or reuse models across tasks, as the output channels are tightly coupled with the number and type of semantic classes. To overcome these limitations, we introduce Referring Change Detection (RCD), which leverages natural language prompts to detect specific classes of changes in remote sensing images. By integrating language understanding with visual analysis, our approach allows users to specify the exact type of change they are interested in. However, training models for RCD is challenging due to the limited availability of annotated data and severe class imbalance in existing datasets. To address this, we propose a two-stage framework consisting of (I) RCDNet, a cross-modal fusion network designed for referring change detection, and (II) RCDGen, a diffusion-based synthetic data generation pipeline that produces realistic post-change images and change maps for a specified category using only the pre-change image, without relying on semantic segmentation masks and thereby significantly lowering the barrier to scalable data creation. Experiments across multiple datasets show that our framework enables scalable and targeted change detection. Code will be made publicly available on GitHub.

Figure 2. The RCDNet and its training scheme are illustrated. T1- and T2-images represent distinct time points, corresponding to pre-change and post-change states. The change category (class) is randomly selected from unique values in the semantic change map and guides the network through text embeddings. The change map is binarized for that specified category, enabling the use of canonical binary cross-entropy loss as training objective.

Synthetic Data Generation (RCDGen)

We introduce RCDGen, a diffusion-based synthetic data generation pipeline that produces realistic post-change images and change maps for a specified category using only the pre-change image (no semantic segmentation masks).

Figure 3. RCDGen overview: generate post-change image + change map conditioned on (pre-change image, text prompt).

Figure 4. Synthetic samples generated by our pipeline: T1-images (pre-change) are sourced from real datasets, while T2-images (post-change) and corresponding change maps are generated. Labels on the left side of each image pair indicate the specific change category.

Figure 5. Additional synthetic examples showcasing varied change categories and change maps generated by RCDGen.

Figure 6. More synthetic samples illustrating diverse post-change appearances and associated change maps.

BibTeX

@inproceedings{korkmaz2025referring,
  title     = {Referring Change Detection in Remote Sensing Imagery},
  author    = {Korkmaz, Yilmaz and Paranjape, Jay N. and de Melo, Celso M. and Patel, Vishal M.},
  booktitle = {arXiv preprint arXiv:2512.11719},
  year      = {2025}
}