What question did this study set out to answer?

The goal is to determine the best machine learning model for predicting daily streamflow in a subtropical monsoon watershed.

February 8, 2026Open Access

Comparative assessment of machine learning models for daily streamflow prediction in a subtropical monsoon watershed

Key Points

The goal is to determine the best machine learning model for predicting daily streamflow in a subtropical monsoon watershed.
Compared seven machine learning models including LSTM, ANN, and XGB for streamflow prediction.
Performed feature importance analysis to identify key predictors of streamflow.
Conducted high-flow evaluations to assess model performance under extreme conditions.
LSTM achieved the highest performance metrics with NSE of 0.95.
Under high-flow conditions, LSTM showed better accuracy compared to tree-based models with smallest underestimation of flood peaks.
Identified upstream flow as the most important predictor by XGB with an importance score of 0.373.

Abstract

Accurate streamflow prediction is critical for flood warning and water resources management in subtropical monsoon watersheds, yet optimal model selection remains challenging. This study compared seven machine learning models, including Linear Regression (LR), Gradient Boosting Regressor, Artificial Neural Network (ANN), Random Forest Extra Trees Regressor, XGBoost (XGB), and Long Short-Term Memory (LSTM), for daily streamflow prediction in the Boluo Watershed, South China. Results demonstrated that LSTM achieved superior performance with NSE and KGE of 0.95, followed by ANN and LR. High-flow evaluation revealed that LSTM maintained robust performance under extreme conditions, achieving NSE of 0.86, 0.80, and 0.45 for flows exceeding the 90th, 95th, and 99th percentiles respectively. For flood peaks, LSTM showed the smallest underestimation of 7 to 20%, compared to 30 to 50% for tree-based models. Feature importance analysis revealed upstream flow from Lingxia Station as the dominant predictor (importance of 0.373 for XGB), reflecting watershed memory effects whereby streamflow is predominantly controlled by antecedent hydrological conditions. Residual analysis identified pronounced heteroscedasticity with increasing prediction errors under high-flow conditions. These findings demonstrate that temporal memory mechanisms provide substantial advantages for streamflow prediction under extreme conditions, offering guidance for model selection in operational flood forecasting systems.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Zhi Zhang

Yusha Xiao

Runting Chen

Journals

Scientific Reports

Actions

Institutions

Sun Yat-sen University

Zhaoqing University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Comparative assessment of machine learning models for daily streamflow prediction in a subtropical monsoon watershed

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study