Accurate prediction of the vertical distribution of soil organic matter (SOM) is essential for assessing the carbon cycle in arid regions. In this study, predictive models for SOM across depths of 0–100 cm were developed using 190 soil samples from 48 profiles in the Urumqi River Basin. The models integrate visible near‑infrared (VIS–NIR) with multi‑source environmental covariates. The random forest (RF) and convolutional neural network (CNN) approaches were compared, and the SHapley Additive exPlanations (SHAP) framework was employed to interpret the governing mechanisms within the optimal model. The results show that incorporating environmental covariates significantly improved model performance, especially in the surface layers (0–40 cm). The CNN model achieved its highest accuracy in the 0–20 cm layer (R² = 0.97) and the 80–100 cm layer (R² = 0.95). SHAP analysis further revealed a shift in the dominant drivers with depth: vegetation indices contributed cumulatively 39.1% in surface layers, whereas soil properties and topographic features became more influential in deeper horizons. These findings empirically support the classical theory that SOM profile formation is co‑regulated by climate, biota, and parent material, and provide an effective framework for multi‑depth soil carbon estimation in arid landscapes.
Dai et al. (Thu,) studied this question.