Unlike previous studies that rely on high-frequency (15-min or hourly) datasets, this study is among the first to use low-frequency (weekly) data to evaluate the performance of linear and nonlinear machine learning (ML) algorithms for predicting biochemical oxygen demand (BOD) and ammonium nitrogen (NH 4 + -N) in the primary and secondary treatment effluents from the Subiaco Water Resource Recovery Facility (WRRF) in Western Australia. Various feature selection methods, including filters, wrappers, and embedded methods, were employed to identify the most effective approach that achieves the highest model performance while enhancing computational efficiency. The results demonstrate that a reduced set of key features can achieve comparable predictive accuracy with lower computational complexity. For BOD prediction in primary effluent, a multilayer perceptron (MLP) achieved a root mean square error (RMSE) of 23.50 mg per liter (mg/L) using features selected based on mutual information. In the secondary effluent, SVR (rbf) and random forest feature selection yielded the best predictions, achieving an RMSE of 3.26 mg/L. Similarly, for NH 4 + -N, multiple linear regression with backward elimination achieved an RMSE of 2.71 mg/L in the primary effluent. In comparison, a random forest with five key predictors achieved an RMSE of 1.51 mg/L in the secondary effluent, indicating high accuracy in NH 4 + -N prediction. These findings demonstrate that data-driven models can predict BOD and NH 4 + -N using low-frequency monitoring data, supporting supervisory-level operational decision-making in wastewater treatment plants and near-real-time wastewater quality assessment. Furthermore, generalization analysis indicates that linear models perform more consistently across multiple targets and evaluation metrics. • Application of low-frequency data for ML-based wastewater prediction. • First-time prediction of BOD and NH 4 + -N in primary treatment effluent. • Comparative evaluation of seven linear and nonlinear ML algorithms. • Systematic comparison of filter, wrapper, and embedded feature selection methods. • Linear models demonstrate superior generalization under sparse monitoring conditions.
Building similarity graph...
Analyzing shared references across papers
Loading...
Khoshvaght et al. (Fri,) studied this question.
synapsesocial.com/papers/69ada873bc08abd80d5bb665 — DOI: https://doi.org/10.1016/j.engappai.2026.114349
Hoda Khoshvaght
Edith Cowan University
Rizki Permala
National Research and Innovation Agency
Amir Razmjou
Edith Cowan University
Engineering Applications of Artificial Intelligence
Edith Cowan University
Water Corporation of Western Australia (Australia)
Building similarity graph...
Analyzing shared references across papers
Loading...