Key points are not available for this paper at this time.
The activated sludge process is pivotal in wastewater treatment, with ongoing research into its process control methods. Modeling treatment plants aids in analyzing relationships among variables, supporting fault detection and operational decision-making. However, datasets from real-world treatment plants often contain outliers and missing values due to sensor faults, maintenance activities, and operational disruptions, making outlier handling and data imputation essential for reliable modeling. Existing studies on data imputation for activated sludge systems are often based on synthetic or short datasets, limited method comparisons, or inconsistent evaluation metrics, which reduces their applicability to full-scale operational settings. This study addresses these limitations by presenting a comprehensive, head-to-head comparison of Kohonen Self-Organising Maps (KSOM) with widely used multiple imputation and tree-based methods, namely Amelia II, MICE, missForest, and missRanger. The methods are applied to a real-world multivariate dataset comprising 19 process variables collected over 8.5 years from a full-scale activated sludge treatment plant, containing 39% overall missing data with highly uneven missingness across variables. A validation framework based on held-out observation data is used, and performance is assessed using complementary metrics, including the coefficient of determination (R2), average absolute error (AAE), relative average absolute error (RAAE), mean squared error (MSE), and root mean squared error (RMSE). Results show that KSOM consistently outperforms the competing methods across most variables and evaluation metrics. KSOM achieves near-perfect R2 values (≈1) for many process variables, with lower absolute and relative errors, even for variables with very high (>70%) and irregular missingness. These findings highlight KSOM’s robustness in capturing multivariate relationships and cluster structure in complex, operational WWTP data.
Deepak et al. (Thu,) studied this question.