Accurately predicting student dropout in Massive Open Online Courses (MOOCs) remains a critical challenge in educational data mining. While Spatio-Temporal Graph Neural Networks (STGNNs) have shown promise, established frameworks typically rely on first-order temporal dependencies, recursively deriving the current state solely from its immediate predecessor. We argue that such recursive compression fails to capture complex student behaviors, which are driven by the interplay between immediate short-term shocks and accumulated long-term patterns. To address this, we propose the Multi-Scale Spatio-Temporal Graph Network (MST-GCN). The core of our framework is a novel MST-RGCN layer featuring a Spatially-Conditioned Adaptive Gate. This mechanism dynamically modulates the fusion of short-term and long-term memories by explicitly conditioning on the evolving heterogeneous graph context. Comprehensive experiments on two large-scale benchmarks, KDD Cup 2015 and XuetangX, demonstrate that MST-GCN yields superior predictive performance compared to established baselines. Notably, our model exhibits remarkable robustness in unstructured, self-paced learning environments. Furthermore, qualitative analysis reveals that the model learns an interpretable policy: prioritizing long-term history to identify at-risk students while leveraging short-term momentum to predict successful learners. Our source code is publicly available at https://github.com/wudongze9/MST-GCN.
Duan et al. (Thu,) studied this question.