What question did this study set out to answer?

The research aims to develop and evaluate an AI-driven early warning system to predict student performance and reduce attrition in higher education.

June 4, 2026Open Access

AI-Driven Sustainable Transformation of the Educational Supply Chain: Comparative Evaluation of Machine Learning Models for an Early Warning System and Design-Level Frameworks for Institutionalization and Impact Assessment

Key Points

The research aims to develop and evaluate an AI-driven early warning system to predict student performance and reduce attrition in higher education.
Collected learning trajectory data from 188 students over four semesters using iClass LMS.
Compared three machine learning models: Random Forest, GRU, and LSTM under a 5-seed protocol with time-masking.
Utilized metrics like recall, precision, and accuracy to assess model performance in predicting at-risk students.
Random Forest achieved 85.59% accuracy and a Fail-recall of 91.19%, providing reliable early warnings.
LSTM and GRU showed poor performance early in the semester, with Fail-recalls of 0–42% until Week 14.
Cumulative feature extraction showed larger between-class separation than original features (|d| 0.717 vs. 0.192; p<0.001).

Abstract

Higher education institutions face the persistent challenge of student attrition, a critical risk node within the educational supply chain (ESC). This study adopts a supply chain management (SCM) perspective to apply artificial intelligence (AI) for sustainable transformation of the ESC and evaluates an early warning system (EWS) for student performance prediction on a single programming course at Tamkang University. Learning trajectory data from 188 students across four semesters (90 for training, 98 for temporal validation; 30 fail cases in total) were collected from the iClass learning management system. To match the operational goal of the EWS—maximizing detection of at-risk students—the minority Failclass was treated as the positive class, so that recall directly measures sensitivity to at-risk cases. Three models were compared under a 5-seed protocol with time-masking to prevent future-week leakage: Random Forest (RF) with SMOTE, GRU, and LSTM. Averaged across weeks 6–16 and both validation semesters, RF achieved an accuracy 85.59%, a Fail-recall 91.19%, a precision 58.89%, and an F1 70.36%, already providing reliable warning at Week 6 (Fail-recall 87.86%). Under the same protocol LSTM and GRU collapsed to the majority class during weeks 6–10 (Fail-recall 0–42%), yielding higher headline accuracy but substantially lower sensitivity; they became usable only from Week 14 onwards (LSTM Fail-recall 80.00% at Week 14, 82.86% at Week 16). A Wilcoxon test on Cohen’s d over 90 (week×feature) pairs showed that cumulative features exhibit larger, not smaller, between-class separation than original features (|d| 0.717 vs. 0.192; p<0.001), indicating that the original-vs-cumulative trade-off is one of sensitivity versus precision rather than information dilution. As design-level companions to these empirical results, the study also proposes a three-tier institutionalization framework and a four-dimensional impact assessment framework; these are offered as implementation blueprints rather than empirically validated outcomes. The contributions of this paper are operational rather than methodologically novel: (i) a reproducible EWS benchmark on a small, imbalanced ESC dataset, including a diagnosis of LSTM/GRU’s early-week majority-class collapse under naive augmentation, and (ii) design-level institutionalisation and impact-assessment scaffolding offered as a template for subsequent institutional pilots, not as empirically validated outcomes of the present study.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Chen-Chung Chi (Mon,) studied this question.

synapsesocial.com/papers/6a2116fad499ed480b16fccc https://doi.org/https://doi.org/10.3390/su18115523

Bookmark

View Full Paper