Higher education institutions face the persistent challenge of student attrition, a critical risk node within the educational supply chain (ESC). This study adopts a supply chain management (SCM) perspective to apply artificial intelligence (AI) for sustainable transformation of the ESC and evaluates an early warning system (EWS) for student performance prediction on a single programming course at Tamkang University. Learning trajectory data from 188 students across four semesters (90 for training, 98 for temporal validation; 30 fail cases in total) were collected from the iClass learning management system. To match the operational goal of the EWS—maximizing detection of at-risk students—the minority Failclass was treated as the positive class, so that recall directly measures sensitivity to at-risk cases. Three models were compared under a 5-seed protocol with time-masking to prevent future-week leakage: Random Forest (RF) with SMOTE, GRU, and LSTM. Averaged across weeks 6–16 and both validation semesters, RF achieved an accuracy 85.59%, a Fail-recall 91.19%, a precision 58.89%, and an F1 70.36%, already providing reliable warning at Week 6 (Fail-recall 87.86%). Under the same protocol LSTM and GRU collapsed to the majority class during weeks 6–10 (Fail-recall 0–42%), yielding higher headline accuracy but substantially lower sensitivity; they became usable only from Week 14 onwards (LSTM Fail-recall 80.00% at Week 14, 82.86% at Week 16). A Wilcoxon test on Cohen’s d over 90 (week×feature) pairs showed that cumulative features exhibit larger, not smaller, between-class separation than original features (|d| 0.717 vs. 0.192; p<0.001), indicating that the original-vs-cumulative trade-off is one of sensitivity versus precision rather than information dilution. As design-level companions to these empirical results, the study also proposes a three-tier institutionalization framework and a four-dimensional impact assessment framework; these are offered as implementation blueprints rather than empirically validated outcomes. The contributions of this paper are operational rather than methodologically novel: (i) a reproducible EWS benchmark on a small, imbalanced ESC dataset, including a diagnosis of LSTM/GRU’s early-week majority-class collapse under naive augmentation, and (ii) design-level institutionalisation and impact-assessment scaffolding offered as a template for subsequent institutional pilots, not as empirically validated outcomes of the present study.
Chen-Chung Chi (Mon,) studied this question.