Water quality assessment plays a vital role in environmental monitoring and resource management. This study aims to enhance the predictive modeling of the Water Quality Index (WQI) using a combination of statistical diagnostics and machine learning techniques. Data collected from six river locations in Malaysia are analyzed. The methodology involves collecting water quality data from six river locations in Malaysia, followed by a series of statistical analyses including assumption testing (shapiro–wilk and breusch–pagan tests), diagnostic evaluations, feature importance analysis, and principal component analysis (PCA). Decision tree regression (DTR) and autoregressive integrated moving average (ARIMA) are employed for regression, while random forest is used for classification. Learning curve analysis is conducted to evaluate model performance and generalization. The results indicate that dissolved oxygen (DO) and ammoniacal nitrogen (AN) are the most influential parameters, with normalized importance scores of 1.000 and 0.565, respectively. The breusch–pagan test identifies significant heteroscedasticity (p-value = (3.138e−115)), while the Shapiro–Wilk test confirms non-normality (p-value = 0.0). PCA effectively reduces dimensionality while preserving 95% of dataset variance, optimizing computational efficiency. Among the regression models, ARIMA demonstrates better predictive accuracy than DTR. Meanwhile, random forest achieves high classification performance and shows strong generalization capability with increasing training data. Learning curve analysis reveals overfitting in the regression model, suggesting the need for hyperparameter tuning, while the classification model demonstrates improved generalization with additional training data. Strong correlations among key parameters indicate potential multicollinearity, emphasizing the need for careful feature selection. These findings highlight the synergy between statistical pre-processing and machine learning, offering a more accurate and efficient approach to water quality prediction for informed environmental policy and real-time monitoring systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Amar Lokman
Universiti Sains Islam Malaysia
Wan Zakiah Wan Ismail
Universiti Sains Islam Malaysia
Nor Azlina Ab. Aziz
Multimedia University
Algorithms
Multimedia University
Universiti Sains Islam Malaysia
Building similarity graph...
Analyzing shared references across papers
Loading...
Lokman et al. (Fri,) studied this question.
synapsesocial.com/papers/68a3656a0a429f797332b979 — DOI: https://doi.org/10.3390/a18080494