Accurately predicting uranium concentration in groundwater is vital for protecting public health. Traditional spectroscopic detection methods, although effective and reliable, are time consuming and demand equipment and specialists. In addition, the inclusion of uranium in water quality investigations is rarely a routine practice. We identified Punjab, India as a probable polluted region and measured the risk of uranium contamination using machine learning (ML) models. Uranium was predicted based on water quality parameters that are regularly tested within the DWSS Punjab state drinking water surveillance program, including arsenic, cadmium, mercury, nickel, iron, lead, chromium, nitrate, chloride, fluoride, and sulfate. We employed a large dataset comprising 8735 samples to develop regression models based on Gradient Boosting, Random Forest, and XGBoost algorithms. Furthermore, ensemble methods such as weighted averaging, stacking, and voting provided competitive alternatives and valuable diversity. To account for the variations in water quality characteristics, we introduced K-means and Gaussian mixture clustering and the best regressors were chosen for each cluster. Our results demonstrate that the integration of clustering, regression, and ensemble learning boosted the overall predictive performance. The highest performance (R2 > 90%) was achieved by the K-means-based XGBoost regressor. Among the available predictors, sulfate, fluoride, nitrate, and chloride emerged as the most informative variables in the ML framework, consistent with their association with hydrochemical regimes linked to elevated uranium. This study contributes to a statistical framework for uranium monitoring and risk reduction, thereby aiding public health and sustainable agriculture in Punjab.
Sudhir et al. (Fri,) studied this question.