To address the limitations of traditional runoff prediction methods—namely, the oversimplification of meteorological factor selection, ambiguous interactions among core variables, and the disruptive influence of redundant inputs—this study focuses on the Zijiang River Basin as a representative case. A suite of machine learning models, including Long Short-Term Memory Neural Network (LSTM), Convolutional Neural Network (CNN)-LSTM, Temporal Convolutional Network (TCN), and Gradient Boosting Regression Tree (GBRT), was constructed and trained using 13 distinct combinations of meteorological variables. These configurations were systematically evaluated to assess their compatibility with each model in simulating daily runoff patterns. Additionally, the Shapley Additive Explanations (SHAP) algorithm was employed to quantitatively assess the contribution of each factor to predictive accuracy. Among the models tested, the TCN model consistently demonstrated superior performance, particularly in mitigating the effects of irrelevant or redundant features. The GBRT model showed distinctive strengths in accurately predicting peak flow timings. Of all input configurations, the combination of “runoff + precipitation + evaporation + temperature” emerged as the most effective. Findings indicate that the predictive value of individual meteorological variables hinges primarily on their direct correlation with runoff, while the effectiveness of multi-factor schemes depends on the degree of functional integration—specifically, the coupling of hydrological recharge, consumption, and regulatory processes. The presence of redundant variables was found to impair model performance unless they contributed to a meaningful synergistic relationship with core inputs. The SHAP analysis further reinforced these insights: precipitation-related variables proved to be the most critical to prediction accuracy, whereas temperature and evaporation served more complementary roles. Notably, the inclusion of relative humidity tended to suppress runoff responses and increased deviation in peak timing estimates. These findings shed light on the nuanced interplay between meteorological input design and model selection, offering a robust foundation for optimizing data-driven runoff prediction frameworks.
Ma et al. (Mon,) studied this question.