What question did this study set out to answer?

The aim is to develop machine learning models to effectively predict student dropout at different stages of their undergraduate journey.

April 5, 2026

A machine learning approach to predicting student dropout and supporting timely intervention

Key Points

The aim is to develop machine learning models to effectively predict student dropout at different stages of their undergraduate journey.
Used institutional data from 4,394 engineering students from 2007 to 2016.
Evaluated five classification algorithms using standard performance metrics.
Conducted out-of-time validation with future cohort data.
Adjusted classification thresholds to improve model sensitivity.
Pre-enrollment models had limited predictive performance.
Incorporating first-semester results improved predictions significantly.
Random Forest algorithm outperformed other models.
Adjusting the threshold in the third-semester model increased sensitivity from 66% to nearly 80%.

Abstract

Purpose This study aims to develop and evaluate machine learning models to predict dropout across three key stages of the undergraduate journey: pre-enrollment, the end of the first semester, and the end of the third semester. Design/methodology/approach The study uses institutional data from 4,394 engineering students enrolled between 2007 and 2016 at a Brazilian public university. Five classification algorithms were evaluated using standard performance metrics. An out-of-time validation was conducted using data from a future cohort to assess the generalizability of the models. The study also investigates the impact of adjusting the classification threshold to optimize model sensitivity for early identification of at-risk students. Findings Models based solely on pre-enrollment data showed limited predictive performance. Incorporating first-semester academic results led to substantial improvements, while gains from third-semester data were modest. Random Forest consistently outperformed the other algorithms. A slight adjustment to the classification threshold in the third-semester model increased sensitivity from 66% to nearly 80%, with only a minor drop in precision, thereby improving the model's capacity to detect at-risk students. Originality/value In addition to comparing multiple prediction timeframes, this study introduces a practical evaluation of classification threshold adjustment. By demonstrating how threshold tuning can enhance early detection while maintaining high precision, the study offers actionable insights for data-informed intervention strategies in higher education.

Bookmark

A machine learning approach to predicting student dropout and supporting timely intervention

Key Points

Abstract

Cite This Study