Abstract Student dropout in higher education remains a persistent challenge for public universities, generating significant academic, social, and economic impacts. Early student withdrawal compromises the efficiency of public investment in education, reduces graduation rates, and limits students’ professional opportunities. In this context, the use of machine learning techniques has shown promising potential for identifying patterns associated with dropout risk and supporting institutional decision-making. This study aimed to apply and compare different supervised machine learning algorithms to predict student dropout in a Brazilian federal university, as well as to identify the main factors associated with academic attrition. The research adopted a quantitative approach using an institutional dataset comprising 20,275 students admitted since 2010 across three campuses of the university. Several classification algorithms were tested, including Neural Network, Decision Tree, Random Forest, Gradient Boosting, AdaBoost, Naive Bayes, and Logistic Regression. Model performance was evaluated using metrics such as accuracy, precision, recall, F1-score, and the area under the ROC curve (AUC), along with model interpretability through the SHAP technique. The results indicated strong predictive performance across the models, with Gradient Boosting demonstrating the best overall results. The most influential predictors of dropout were cumulative grade point average and the number of course failures. The findings suggest that machine learning models can support the early identification of at-risk students and contribute to institutional retention policies.
Almeida et al. (Sun,) studied this question.