What question did this study set out to answer?

The aim is to develop a framework that predicts students' risk of academic decline using prior course information.

April 23, 2026Open Access

A Two-Stage, Leakage-Aware Framework for Early Academic Risk Detection in Undergraduate Engineering Cohorts

Key Points

The aim is to develop a framework that predicts students' risk of academic decline using prior course information.
Utilized a two-stage machine learning framework to forecast course marks and generate risk signals.
Analyzed anonymized records of 905 undergraduate engineering students using 56 features.
Employed various algorithms including Voting Regressor and stacking classifiers for predictions.
Achieved mean absolute errors (MAEs) ranging from 5.71 to 7.10 across subjects.
Test accuracy reached 0.674, with recall at 0.657 and F1 score of 0.438 for risk classification.
Demonstrated effective identification of at-risk students based on shifts in predicted cohort percentiles.

Abstract

Timely identification of students at risk of meaningful academic decline enables targeted advising and reduces avoidable failures. We present a leakage-aware, two-stage machine learning framework that first forecasts subject-wise course marks using only information available up to the end of prior semesters, and then translates these forecasts into cohort-relative risk signals suitable for operational interventions. Using anonymized records for 905 undergraduate engineering students with 56 features (demographics, attendance and past marks), we model four core subjects independently and compute predicted cohort percentiles. A student is labeled ``at risk'' if the predicted cohort percentile drops by 10 or more points relative to the prior semester; Semester~3 is used herein as an illustrative case study to demonstrate the approach. Across subjects, a Voting Regressor (Ridge + Lasso + ElasticNet) with One-Hot encoding and Robust scaling yields test MAEs between 5.71–7.10. A stacking classifier (CatBoost, Balanced-Bagging LGBM, ExtraTrees with a logistic meta-learner) attains test accuracy 0.674, recall 0.657 and F1 0.438 when operating at a threshold chosen to prioritize recall. We discuss leakage prevention, deployment, ethical considerations, and directions for multi-institution validation. A lightweight web implementation of the pipeline is accessible online.

A Two-Stage, Leakage-Aware Framework for Early Academic Risk Detection in Undergraduate Engineering Cohorts

Key Points

Abstract

Cite This Study