Timely identification of students at risk of meaningful academic decline enables targeted advising and reduces avoidable failures. We present a leakage-aware, two-stage machine learning framework that first forecasts subject-wise course marks using only information available up to the end of prior semesters, and then translates these forecasts into cohort-relative risk signals suitable for operational interventions. Using anonymized records for 905 undergraduate engineering students with 56 features (demographics, attendance and past marks), we model four core subjects independently and compute predicted cohort percentiles. A student is labeled ``at risk'' if the predicted cohort percentile drops by 10 or more points relative to the prior semester; Semester~3 is used herein as an illustrative case study to demonstrate the approach. Across subjects, a Voting Regressor (Ridge + Lasso + ElasticNet) with One-Hot encoding and Robust scaling yields test MAEs between 5.71–7.10. A stacking classifier (CatBoost, Balanced-Bagging LGBM, ExtraTrees with a logistic meta-learner) attains test accuracy 0.674, recall 0.657 and F1 0.438 when operating at a threshold chosen to prioritize recall. We discuss leakage prevention, deployment, ethical considerations, and directions for multi-institution validation. A lightweight web implementation of the pipeline is accessible online.
Shail Patel (Wed,) studied this question.