Study Finds Hybrid CNN-Clay Model Improves Landslide Detection Accuracy Over Baseline

A study evaluates Clay v1.5, a Geospatial Foundation Model, for pixel-level landslide segmentation on the Landslide4Sense benchmark. The hybrid U-Net + Clay model with two-stage LoRA achieves a test F1 of 64.5%, outperforming both the Clay-only backbone and a standard U-Net baseline.

iGEN Editorial

June 17, 2026

Study Finds Hybrid CNN-Clay Model Improves Landslide Detection Accuracy Over Baseline

Rapid post-event landslide mapping is critical for disaster response but remains challenging due to extreme class imbalance. According to a study published on arXiv by researcher Huong Binh Vu, a hybrid approach combining a Convolutional Neural Network (CNN) with a Geospatial Foundation Model (GFM) significantly improves pixel-level landslide detection accuracy.

The research evaluates Clay v1.5, a Geospatial Foundation Model, on the Landslide4Sense (L4S) benchmark dataset. This dataset contains 3,799 training chips with 14 Sentinel-2 and terrain bands, and approximately 2% of pixels represent positive landslide samples.

Methodology

The study compares three strategies: using Clay as the primary encoder with multi-scale residual terrain fusion, a U-Net backbone augmented with Clay semantic context at the bottleneck, and a standard U-Net baseline. The hybrid U-Net + Clay model employs two-stage Low-Rank Adaptation (LoRA).

Results

The hybrid U-Net + Clay model achieved the best test F1 score of 64.5% ± 1.8% over three seeds, surpassing the Clay-only backbone (55.2% ± 3.6%) and the U-Net baseline (59.9%). Clay as a standalone encoder underperformed the U-Net due to the absence of multi-scale skip connections. However, its pretrained representations consistently improved performance when injected as auxiliary context.

Model	Test F1 Score
U-Net baseline	59.9%
Clay-only backbone	55.2% ± 3.6%
Hybrid U-Net + Clay (LoRA)	64.5% ± 1.8%

Implications

The findings suggest that Geospatial Foundation Models are most effective for landslide detection when they complement spatially detailed convolutional architectures rather than replacing them. This hybrid approach offers a path to more reliable automated landslide mapping, which is essential for rapid disaster response. The paper code and data are available through arXiv.

Sources:

Study Finds Hybrid CNN-Clay Model Improves Landslide Detection Accuracy Over Baseline

Methodology

Results

Implications

Recommended Stories

Mutual Distillation of Dual Foundation Models Achieves State-of-the-Art PET/CT Segmentation with Only 5 Labeled Cases

New Benchmark and Method Address Occlusion in Vision-Language-Action Models for Robotics

DySink: Dynamic Frame Sinks Enable Adaptive Long Video Generation Without Context Collapse

Uncertainty Quality of VGGT: Analysis on DTU Benchmark Dataset Reveals Effective Confidence Threshold for 3D Reconstruction