The Missouri River Basin, located between the states of Nebraska and Iowa, USA. Water level (WL) and discharge (Q) are key hydrological variables, accurate prediction of which in both long-term and short-term (extreme events) scenarios is essential for water resources and flood risk management. We propose a novel hybrid deep learning architecture, CNN-NLSTM-SA, which integrates a Convolutional Neural Network (CNN) branch for extracting local features and a Multi-State LSTM (NLSTM) branch for capturing long-term temporal dependencies. NLSTM, with its child-parent structure, enhances memory propagation and mitigates vanishing and exploding gradients. The outputs of these two branches are fused through a multi-head Self-Attention (SA) mechanism, enabling the model to automatically emphasize the most informative representations. The proposed model is evaluated for both long-term and short-term forecasting scales. The long-term scenario leverages extensive historical data to provide a large set of training data, whereas the short-term scenario focuses on extreme events with limited training samples. To mimic real-world operational challenges in poorly gauged or data-scarce basins, the model is also tested under varying station-availability conditions using a Leave-n-Station-Out (LnSO) validation strategy. An ablation study comparing CNN-NLSTM-SA with several single- and dual-branch alternatives (CNN-LSTM-SA, LSTM-SA, CNN-SA, CNN-LSTM) shows the superior performance of the proposed architecture. Overall, CNN-NLSTM-SA demonstrates strong potential for WL and Q prediction in both data-rich and data-limited environments. • A novel deep learning architecture, termed CNN-NLSTM-SA, is proposed. • It integrates Convolutional Neural Networks (CNN) and multi-state LSTMs (NLSTM) through a multi-head Self-Attention (SA) mechanism. • The proposed model is evaluated for long-term forecasting using extensive historical data. • Short-term forecasting performance is also assessed, with a focus on extreme events. • An ablation study with several single- and dual-branch alternatives demonstrates its strong predictive performance.
Gharehtoragh et al. (Thu,) studied this question.