Abstract Proper modeling and forecasting of water quality is one of the important planning and decision-making tools in water resources management. Water quality prediction is a difficult task due to various uncertainties and complex non-linear relationships between water quality variables. Therefore, it is necessary to use a suitable tool and model to predict water quality. Therefore, in this study, using machine learning (ML) capabilities, a classification model was presented to identify potable water based on a number of water quality indices as input. The main ML models used in this article include Random Forest (RF) and Extreme Gradient Boosting algorithms (XGBoost) algorithms. In order to improve the classification accuracy, the hyperparameters of each of them were adjusted by optimization algorithms in each training iteration. The optimization algorithms used in this study include four algorithms: Enhanced Artificial Ecosystem-Based Optimization, Adaptive Differential Evolution with Optional External Archive, the Original Flower Pollination Algorithm, and Original Pareto-Like Sequential Sampling. The results revealed that the Enhanced Artificial Ecosystem-Based Optimization–Extreme Gradient Boosting hybrid model achieved the highest Accuracy, Recall, and F1-score, while the Original Pareto-Like Sequential Sampling–Random Forest hybrid model produced the best Precision. Overall, the findings confirm that optimizing hyperparameters and constructing hybrid models improves prediction accuracy compared with the base algorithms. Consequently, a hybrid model based on Enhanced Artificial Ecosystem-Based Optimization combined with Random Forest is recommended for potable water classification.
Xing et al. (Fri,) studied this question.