To address the issues of balancing predictive accuracy with model interpretability in landslide susceptibility mapping, this study integrates Information Value (IV) with machine learning algorithms including Random Forest (RF), Support Vector Machine (SVM), and XGBoost to build a hybrid framework for enhanced landslide susceptibility assessment. The substantive methodological innovations in this research are: 1) a sequential three-stage feature selection protocol combining correlation analysis, Variance Inflation Factor (VIF), and entropy-based Information Gain, which preserve geologically significant variables with non-linear relationships; 2) an IV-guided non-landslide sampling strategy that systematically reduces spatial bias by restricting sampling to the lowest susceptibility tertile; and 3) a comprehensive validation framework integrating statistical metrics, spatial distribution analysis, and spatial concordance verification using 372 GPS-documented inventory sites (74.5%), overlaid post-hoc on final susceptibility maps. Considering slope, aspect, elevation, curvature, TWI, annual rainfall, geology, soil type, NDVI, LULC, and proximity to roads, rivers, and faults as the causative factors through rigorous screening, XGBoost demonstrated the highest performance with 96% accuracy, 0.991 AUC-ROC (95% CI: 0.979–0.995), and 96% F1-score compared to RF with 82% accuracy and 0.888 AUC-ROC (95% CI: 0.873–0.903) and SVM with 91% accuracy and 0.944 AUC-ROC (95% CI: 0.932–0.956). Jenks natural breaks revealed that the machine learning models allocate substantially larger proportions to Very High susceptibility zones (XGBoost: 39.75%, RF: 26.33%, SVM: 39.33%) than the IV (12.83%), capturing complex multivariate geospatial relationships. The dual-pathway framework exhibits transferability to data-scarce regions, where IV can generate preliminary assessments with minimal landslide inventories to guide strategic field verification campaigns.
Joshi et al. (Tue,) studied this question.