Dissolved oxygen (DO) refers to the mass of oxygen that is contained in the water. The concentration of DO is an important indicator of the water quality. Maintaining adequate DO levels in surface waters is necessary to sustain public health, aquatic ecosystems, and agricultural water quality worldwide. Oxygen-deficient streams cause fish death, pathogen growth, and reduced self-purification capacity issues that are particularly severe in densely populated and agriculturally intensive regions. The potential contribution of hydraulic structures to increase oxygen content in rivers through air entrainment has been recognized in the past. Aeration efficiency has been the subject of several empirical relationships, which often do not yield very good results. One of the probable reasons may be the complexity of the flow field and associated turbulence, which are difficult to account for. Even if these effects are accounted for, their applications at other scales or other hydraulic conditions may be infeasible. An alternative to these models is missing in the literature. Through the use of machine learning (ML) algorithms, the present work explores the likely improvement over the use of traditionally used empirical models. In this study, ensemble ML models such as random forest (RF), gradient boosting (GB), extreme GB (XGB), and adaptive boosting (ADB) are used to predict E20 for hydraulic structures. The ML models employed in this study use GridSearchCV for hyperparameter optimization along with K-fold cross-validation. A data set available in literature comprising a wide range of flow rates, tailwater depth, and head loss has been used in this study. To evaluate the performance of these models, several performance metrics, such as correlation coefficient (CC), root mean square error (RMSE), Nash–Sutcliffe efficiency (NSE), mean absolute percentage of error (MAPE), Willmott’s index of agreement (IA), and percent bias (PBIAS) are used. Results show that the GB algorithm is the most accurate among all the models, with a CC of 0.996 and 0.995 for the training and testing data sets, respectively. The best models (GB and RF) were compared with existing empirical equations, and it was observed that GB and RF outperformed in terms of accuracy and generalization. SHAP (Shapley additive explanations) is used to understand the influence that each input has on the model’s output prediction. Sensitivity analysis using SHAP shows that flow rate is the most important feature. Uncertainty analysis in predicting aeration efficiency using the proposed models was conducted and results highlighted the GB model’s robustness, showing the smallest uncertainty band, 0.095, compared to the other ML models, securing the first rank.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ashwini Tiwari
Indian Institute of Technology Roorkee
K. S. Hari Prasad
C. S. P. Ojha
Journal of Environmental Engineering
Indian Institute of Technology Roorkee
Building similarity graph...
Analyzing shared references across papers
Loading...
Tiwari et al. (Fri,) studied this question.
synapsesocial.com/papers/69c0ddb8fddb9876e79c1253 — DOI: https://doi.org/10.1061/joeedu.eeeng-8415