Soil salinity severely threatens global ecosystems and agriculture, making accurate monitoring an ongoing priority. Currently, efficiently utilizing multi-source datasets to enhance monitoring accuracy while minimizing computational resources remains a critical challenge. This study evaluated several modeling strategies, including full-dataset modeling, variance inflation factor (VIF), Boruta, particle swarm optimization, ant colony optimization and recursive feature elimination (RFE), and validated results across diverse regions (Almaty, Kazakhstan; Shandong, China). We further validated the results using multiple algorithms, including linear regression, partial least squares regression, extreme gradient boosting, k-nearest neighbor and random forest (RF), with topsoil (0–20 cm) electrical conductivity inverted via the optimal method. Results indicate that input feature numbers substantially impact model performance: regional-scale feature selection is indispensable, with RFE outperforming full-dataset modeling (R2 improves by up to 0.28, while RMSE decreases by 2.21 dS m−1) and VIF performing the worst. Transferability is also demonstrated in Almaty and Shandong. Additionally, the RF algorithm shows superior performance in soil salinity mapping (overall accuracy = 0.73; kappa coefficient = 0.65). And, the RFE and SHAP results highlight CRSI, BI, and MSAVI2 as particularly important predictors for estimating soil salinity in our study area. Collectively, this study highlights the critical importance of feature optimization and interpretability in soil attribute mapping through the integration of multi-source remote sensing data.
Shi et al. (Mon,) studied this question.