Effective forecasting of the Water Quality Index (WQI) considerably impacts water resource management as well as public health safety. This study proposes a new approach for WQI forecasting using stacked regression ensemble modeling integrated with SHAP (Shapley Additive explanations), a form of Explainable Artificial Intelligence (XAI). The model was developed using a dataset of 1,987 water quality samples from Indian rivers (2005-2014), processed through six optimized machine learning algorithms: XGBoost, CatBoost, Random Forest, Gradient Boosting, Extra Trees, and AdaBoost, combined using Linear Regression as the meta-learner. The model was trained using seven normalized physicochemical parameters as predictors, and the computed WQI (via the weighted arithmetic method) served as the response variable. The stacked ensemble model outperformed all individual models, achieving the highest performance across all evaluation metrics, with R² reaching 0.9952, Adjusted R² at 0.9947, MAE recorded at 0.7637, and RMSE reduced to 1.0704. Among the individual models, CatBoost and Gradient Boosting demonstrated the strongest standalone performance. CatBoost achieved an R² of 0.9894, Adjusted R² at 0.9883 MAE of 0.8399, and RMSE of 1.5905, while Gradient Boosting attained an R² of 0.9907, Adjusted R² at 0.9898 MAE of 1.0759, and RMSE of 1.4898, respectively. SHAP analysis revealed that DO, BOD, conductivity, and pH were the most influential parameters contributing to the prediction of WQI. This integrated framework improves existing approaches by providing high predictive accuracy and model interpretability along with real-time environmental monitoring capabilities. It fosters anticipatory environmental surveillance, automated policy frameworks, and confidence among stakeholders regarding the sustainability of water resources.
Choudhary et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: