ABSTRACT This study proposes a theory‐informed learning analytics approach for predicting at‐risk students using large‐scale behavioural, demographic and academic data. Building on engagement theory and self‐regulated learning, we engineer pedagogically grounded behavioural indicators that move beyond raw click counts. These indicators include multidimensional engagement measures, temporal regularity and an antiprocrastination score derived from assessment submission patterns. Using the Open University Learning Analytics Dataset (OULAD), comprising 32,593 students across 22 courses, we reformulate the prediction task as a binary classification problem (Pass vs. At‐Risk) and compare three machine learning algorithms: Support Vector Machine (SVM), Random Forest (RF) and Extreme Gradient Boosting (XGBoost). Models are evaluated at four quarter‐based checkpoints over the semester to investigate temporal dynamics and opportunities for timely intervention. Results show that XGBoost consistently outperforms RF and SVM in accuracy, recall, precision and ROC AUC, while behavioural features overwhelmingly dominate demographic and academic variables in predictive importance. Temporal analysis reveals that model performance improves substantially from the first to the third quarter, with mid‐semester predictions offering the best trade‐off between accuracy and time remaining for effective support. The findings demonstrate the value of theory‐driven feature engineering and temporally sensitive evaluation in designing early‐warning systems that are both accurate and pedagogically actionable.
Building similarity graph...
Analyzing shared references across papers
Loading...
Saleh Alhazbi
Expert Systems
Qatar University
Building similarity graph...
Analyzing shared references across papers
Loading...
Saleh Alhazbi (Wed,) studied this question.
synapsesocial.com/papers/69730f78c8125b09b0d1f408 — DOI: https://doi.org/10.1111/exsy.70210