What question did this study set out to answer?

The research aims to predict uranium concentrations in groundwater using water quality parameters and machine learning techniques.

March 30, 2026

Predicting Uranium in Punjab’s Groundwater Using Common Anions: A Machine Learning Approach

Key Points

The research aims to predict uranium concentrations in groundwater using water quality parameters and machine learning techniques.
Utilized a dataset of 8735 samples to develop machine learning regression models.
Employed Gradient Boosting, Random Forest, and XGBoost algorithms for uranium prediction.
Introduced K-means and Gaussian mixture clustering to enhance model accuracy.
Applied ensemble methods such as weighted averaging, stacking, and voting for improved predictions.
Achieved high predictive performance (R2 > 90%) with the K-means-based XGBoost regressor.
Sulfate, fluoride, nitrate, and chloride were identified as the most informative predictors.
Demonstrated that clustering and ensemble methods significantly enhance prediction accuracy.

Abstract

Accurately predicting uranium concentration in groundwater is vital for protecting public health. Traditional spectroscopic detection methods, although effective and reliable, are time consuming and demand equipment and specialists. In addition, the inclusion of uranium in water quality investigations is rarely a routine practice. We identified Punjab, India as a probable polluted region and measured the risk of uranium contamination using machine learning (ML) models. Uranium was predicted based on water quality parameters that are regularly tested within the DWSS Punjab state drinking water surveillance program, including arsenic, cadmium, mercury, nickel, iron, lead, chromium, nitrate, chloride, fluoride, and sulfate. We employed a large dataset comprising 8735 samples to develop regression models based on Gradient Boosting, Random Forest, and XGBoost algorithms. Furthermore, ensemble methods such as weighted averaging, stacking, and voting provided competitive alternatives and valuable diversity. To account for the variations in water quality characteristics, we introduced K-means and Gaussian mixture clustering and the best regressors were chosen for each cluster. Our results demonstrate that the integration of clustering, regression, and ensemble learning boosted the overall predictive performance. The highest performance (R2 > 90%) was achieved by the K-means-based XGBoost regressor. Among the available predictors, sulfate, fluoride, nitrate, and chloride emerged as the most informative variables in the ML framework, consistent with their association with hydrochemical regimes linked to elevated uranium. This study contributes to a statistical framework for uranium monitoring and risk reduction, thereby aiding public health and sustainable agriculture in Punjab.

Bookmark

Predicting Uranium in Punjab’s Groundwater Using Common Anions: A Machine Learning Approach

Key Points

Abstract

Cite This Study