ABSTRACT Workflow diagram showing data preprocessing, machine learning models, hybrid ensemble models, SHAP explainability, and BOD prediction framework for water quality analysis. This study investigates the prediction of biochemical oxygen demand (BOD) levels across India's groundwater, lakes, and rivers using hybrid machine learning (ML) and explainable AI techniques. Traditional water-quality assessment approaches are often time-consuming and computationally expensive, motivating the need for efficient predictive frameworks. A comprehensive dataset collected from multiple Indian water bodies between 2017 and 2021 was analysed using ML algorithms including Random Forest, Support Vector Regressor, Gradient Boosting, XGBoost, and Multi-Layer Perceptron. Hybrid ensemble approaches incorporating stacking and feature-engineering techniques were further developed to improve predictive performance. Results demonstrated that hybrid models achieved higher predictive accuracy and stability than standalone ML models. Statistical significance testing confirmed the superiority of the ensemble approaches. To enhance transparency and interpretability, SHAP (Shapley Additive Explanations) analysis was applied to identify influential water-quality parameters, including fecal coliform, total coliform, and conductivity. The proposed framework provides an interpretable and computationally efficient approach for water-quality prediction and supports evidence-based decision-making for sustainable water-resource management.
Lokhandwala et al. (Sat,) studied this question.