Surface water quality assessment is critical for environmental protection and public health management, yet traditional methods are often time-consuming and costly, limiting their application for real-time monitoring. Machine learning (ML) approaches offer promising alternatives for automated water quality assessment and understanding of key influencing factors. This study employed six ML algorithms to predict water quality grades using comprehensive data from China’s national surface water monitoring network. A dataset comprising 79,015 water quality measurements collected from 1 January to 14 February 2025 was processed with nine physicochemical parameters as input features. The XGBoost model demonstrated superior predictive performance with 99.04% accuracy. Feature importance analysis revealed that nutrient-related parameters (total phosphorus, permanganate index, ammonia nitrogen) consistently ranked as the most critical factors across all models. SHAP analysis provided interpretable explanations of model predictions, revealing grade-specific discrimination patterns where excellent quality waters are primarily distinguished by phosphorus limitation, while severely polluted waters require multi-parameter approaches. This study demonstrates the effectiveness of ML approaches for large-scale water quality assessment and provides a scientific foundation for optimizing monitoring strategies and environmental management decisions in China’s surface water systems.
Li et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: