Purpose This study aims to develop and evaluate machine learning models to predict dropout across three key stages of the undergraduate journey: pre-enrollment, the end of the first semester, and the end of the third semester. Design/methodology/approach The study uses institutional data from 4,394 engineering students enrolled between 2007 and 2016 at a Brazilian public university. Five classification algorithms were evaluated using standard performance metrics. An out-of-time validation was conducted using data from a future cohort to assess the generalizability of the models. The study also investigates the impact of adjusting the classification threshold to optimize model sensitivity for early identification of at-risk students. Findings Models based solely on pre-enrollment data showed limited predictive performance. Incorporating first-semester academic results led to substantial improvements, while gains from third-semester data were modest. Random Forest consistently outperformed the other algorithms. A slight adjustment to the classification threshold in the third-semester model increased sensitivity from 66% to nearly 80%, with only a minor drop in precision, thereby improving the model's capacity to detect at-risk students. Originality/value In addition to comparing multiple prediction timeframes, this study introduces a practical evaluation of classification threshold adjustment. By demonstrating how threshold tuning can enhance early detection while maintaining high precision, the study offers actionable insights for data-informed intervention strategies in higher education.
Curbani et al. (Fri,) studied this question.