Traditional change detection methods can identify where a change occurred in satellite imagery, but they cannot explain in natural language what changed. Existing remote sensing change captioning datasets typically describe overall image-level differences, leaving fine-grained localized semantic reasoning largely unexplored. To close this gap, researchers have introduced RSRCC (Remote Sensing Regional Change Comprehension), a new benchmark for change question-answering that contains 126,000 questions split into 87k training, 17.1k validation, and 22k test instances. According to the paper published on arXiv, RSRCC is built around localized, change-specific questions that require reasoning about a particular semantic change. The authors state that this is the first remote sensing change question-answering benchmark designed explicitly for such fine-grained reasoning-based supervision.
The RSRCC Benchmark
Unlike prior datasets that focus on holistic image captions, RSRCC emphasizes regional change comprehension. Each question targets a specific change region and expects a natural language answer that explains what changed. The dataset covers a variety of semantic categories extracted from remote sensing imagery. The large scale and targeted nature of the questions aim to advance the ability of vision-language models to perform localized reasoning.
How It Works: The Curation Pipeline
To construct RSRCC, the authors introduce a hierarchical semi-supervised curation pipeline that uses Best-of-N ranking as a critical final ambiguity-resolution stage. The pipeline works in three steps:
- Candidate Extraction: Change regions are first extracted from semantic segmentation masks.
- Initial Screening: Candidates are screened using an image-text embedding model to filter obvious mismatches.
- Final Validation: Validated through retrieval-augmented vision-language curation with Best-of-N ranking, which selects the best match among multiple candidates to resolve ambiguity.
This process enables scalable filtering of noisy and ambiguous candidates while preserving semantically meaningful changes. The use of retrieval-augmented methods and Best-of-N ranking is a novel approach in remote sensing benchmark construction.
Significance and Availability
RSRCC is designed to push the boundaries of remote sensing AI by requiring models to understand not just that a change occurred, but what changed in a specific location. The benchmark is released under a CC BY 4.0 license and is available online at the project page (linked in the paper). For enterprise technology leaders, this benchmark could enable more precise automated monitoring of infrastructure, agriculture, or urban development from satellite data, though the paper does not explicitly discuss commercial applications. The authors are Kazoom, Roie, Gigi, Yotam, Leifman, George, Shekel, Tomer, Beryozkin, and Genady.
Technical Details
| Attribute | Details |
|---|---|
| Total Questions | 126,000 |
| Training | 87,000 |
| Validation | 17,100 |
| Test | 22,000 |
| Task Type | Change question-answering (localized) |
| Curation Method | Hierarchical semi-supervised with Best-of-N ranking |
| License | CC BY 4.0 |
The dataset and code are intended to facilitate research in remote sensing vision-language understanding. By focusing on regional change comprehension, RSRCC addresses a gap in existing benchmarks and provides a challenging test for AI models.