What question did this study set out to answer?

This research aims to enhance runoff prediction accuracy by addressing the limitations of traditional methods and exploring new machine learning techniques.

March 4, 2026Open Access

Enhanced Runoff Prediction in Zijiang River Basin Using Machine Learning and SHAP-Based Interpretability

Key Points

This research aims to enhance runoff prediction accuracy by addressing the limitations of traditional methods and exploring new machine learning techniques.
Developed models including LSTM, CNN-LSTM, TCN, and GBRT for runoff prediction.
Used 13 combinations of meteorological variables to train the models.
Employed SHAP to gauge the contribution of each meteorological factor to prediction accuracy.
The TCN model outperformed other models in predicting daily runoff patterns.
The configuration of runoff, precipitation, evaporation, and temperature was the most effective for predictions.
SHAP analysis indicated that precipitation variables were crucial for prediction accuracy, while temperature and evaporation had complementary roles.

Abstract

To address the limitations of traditional runoff prediction methods—namely, the oversimplification of meteorological factor selection, ambiguous interactions among core variables, and the disruptive influence of redundant inputs—this study focuses on the Zijiang River Basin as a representative case. A suite of machine learning models, including Long Short-Term Memory Neural Network (LSTM), Convolutional Neural Network (CNN)-LSTM, Temporal Convolutional Network (TCN), and Gradient Boosting Regression Tree (GBRT), was constructed and trained using 13 distinct combinations of meteorological variables. These configurations were systematically evaluated to assess their compatibility with each model in simulating daily runoff patterns. Additionally, the Shapley Additive Explanations (SHAP) algorithm was employed to quantitatively assess the contribution of each factor to predictive accuracy. Among the models tested, the TCN model consistently demonstrated superior performance, particularly in mitigating the effects of irrelevant or redundant features. The GBRT model showed distinctive strengths in accurately predicting peak flow timings. Of all input configurations, the combination of “runoff + precipitation + evaporation + temperature” emerged as the most effective. Findings indicate that the predictive value of individual meteorological variables hinges primarily on their direct correlation with runoff, while the effectiveness of multi-factor schemes depends on the degree of functional integration—specifically, the coupling of hydrological recharge, consumption, and regulatory processes. The presence of redundant variables was found to impair model performance unless they contributed to a meaningful synergistic relationship with core inputs. The SHAP analysis further reinforced these insights: precipitation-related variables proved to be the most critical to prediction accuracy, whereas temperature and evaporation served more complementary roles. Notably, the inclusion of relative humidity tended to suppress runoff responses and increased deviation in peak timing estimates. These findings shed light on the nuanced interplay between meteorological input design and model selection, offering a robust foundation for optimizing data-driven runoff prediction frameworks.

Enhanced Runoff Prediction in Zijiang River Basin Using Machine Learning and SHAP-Based Interpretability

Key Points

Abstract

Cite This Study