Rapid post-event landslide mapping is critical for disaster response but remains challenging due to extreme class imbalance. According to a study published on arXiv by researcher Huong Binh Vu, a hybrid approach combining a Convolutional Neural Network (CNN) with a Geospatial Foundation Model (GFM) significantly improves pixel-level landslide detection accuracy.
The research evaluates Clay v1.5, a Geospatial Foundation Model, on the Landslide4Sense (L4S) benchmark dataset. This dataset contains 3,799 training chips with 14 Sentinel-2 and terrain bands, and approximately 2% of pixels represent positive landslide samples.
Methodology
The study compares three strategies: using Clay as the primary encoder with multi-scale residual terrain fusion, a U-Net backbone augmented with Clay semantic context at the bottleneck, and a standard U-Net baseline. The hybrid U-Net + Clay model employs two-stage Low-Rank Adaptation (LoRA).
Results
The hybrid U-Net + Clay model achieved the best test F1 score of 64.5% ± 1.8% over three seeds, surpassing the Clay-only backbone (55.2% ± 3.6%) and the U-Net baseline (59.9%). Clay as a standalone encoder underperformed the U-Net due to the absence of multi-scale skip connections. However, its pretrained representations consistently improved performance when injected as auxiliary context.
| Model | Test F1 Score |
|---|---|
| U-Net baseline | 59.9% |
| Clay-only backbone | 55.2% ± 3.6% |
| Hybrid U-Net + Clay (LoRA) | 64.5% ± 1.8% |
Implications
The findings suggest that Geospatial Foundation Models are most effective for landslide detection when they complement spatially detailed convolutional architectures rather than replacing them. This hybrid approach offers a path to more reliable automated landslide mapping, which is essential for rapid disaster response. The paper code and data are available through arXiv.