What question did this study set out to answer?

The aim is to enhance predictive accuracy while maintaining model interpretability in landslide susceptibility mapping using machine learning.

April 23, 2026Open Access

Integration of information value with machine learning method for an enhanced predictive performance in landslide susceptibility mapping

Key Points

The aim is to enhance predictive accuracy while maintaining model interpretability in landslide susceptibility mapping using machine learning.
Developed a hybrid framework integrating information value with machine learning algorithms.
Implemented a three-stage feature selection protocol to retain significant geological variables.
Validated models using statistical metrics and spatial distribution analysis based on 372 GPS-documented sites.
XGBoost achieved the highest predictive performance with 96% accuracy and an AUC-ROC of 0.991.
RF and SVM followed with 82% accuracy (AUC-ROC 0.888) and 91% accuracy (AUC-ROC 0.944), respectively.
The proposed models allocated greater portions to high susceptibility zones compared to traditional methods.

Abstract

To address the issues of balancing predictive accuracy with model interpretability in landslide susceptibility mapping, this study integrates Information Value (IV) with machine learning algorithms including Random Forest (RF), Support Vector Machine (SVM), and XGBoost to build a hybrid framework for enhanced landslide susceptibility assessment. The substantive methodological innovations in this research are: 1) a sequential three-stage feature selection protocol combining correlation analysis, Variance Inflation Factor (VIF), and entropy-based Information Gain, which preserve geologically significant variables with non-linear relationships; 2) an IV-guided non-landslide sampling strategy that systematically reduces spatial bias by restricting sampling to the lowest susceptibility tertile; and 3) a comprehensive validation framework integrating statistical metrics, spatial distribution analysis, and spatial concordance verification using 372 GPS-documented inventory sites (74.5%), overlaid post-hoc on final susceptibility maps. Considering slope, aspect, elevation, curvature, TWI, annual rainfall, geology, soil type, NDVI, LULC, and proximity to roads, rivers, and faults as the causative factors through rigorous screening, XGBoost demonstrated the highest performance with 96% accuracy, 0.991 AUC-ROC (95% CI: 0.979–0.995), and 96% F1-score compared to RF with 82% accuracy and 0.888 AUC-ROC (95% CI: 0.873–0.903) and SVM with 91% accuracy and 0.944 AUC-ROC (95% CI: 0.932–0.956). Jenks natural breaks revealed that the machine learning models allocate substantially larger proportions to Very High susceptibility zones (XGBoost: 39.75%, RF: 26.33%, SVM: 39.33%) than the IV (12.83%), capturing complex multivariate geospatial relationships. The dual-pathway framework exhibits transferability to data-scarce regions, where IV can generate preliminary assessments with minimal landslide inventories to guide strategic field verification campaigns.

Bookmark

View Full Paper

Bookmark

View Full Paper

Integration of information value with machine learning method for an enhanced predictive performance in landslide susceptibility mapping

Key Points

Abstract

Cite This Study