What question did this study set out to answer?

This research aims to identify students at risk of dropping out by applying machine learning techniques.

March 24, 2026Open Access

Application of Machine Learning Techniques to Predict Students at Risk of Attrition in a Federal University

Key Points

This research aims to identify students at risk of dropping out by applying machine learning techniques.
Applied various supervised machine learning algorithms including Neural Network, Decision Tree, and Gradient Boosting.
Utilized an institutional dataset of 20,275 students from a Brazilian federal university since 2010.
Evaluated model performance using accuracy, precision, recall, F1-score, and area under the ROC curve.
Gradient Boosting showed the highest predictive performance among tested models.
Key dropout predictors included cumulative grade point average and number of course failures.

Abstract

Abstract Student dropout in higher education remains a persistent challenge for public universities, generating significant academic, social, and economic impacts. Early student withdrawal compromises the efficiency of public investment in education, reduces graduation rates, and limits students’ professional opportunities. In this context, the use of machine learning techniques has shown promising potential for identifying patterns associated with dropout risk and supporting institutional decision-making. This study aimed to apply and compare different supervised machine learning algorithms to predict student dropout in a Brazilian federal university, as well as to identify the main factors associated with academic attrition. The research adopted a quantitative approach using an institutional dataset comprising 20,275 students admitted since 2010 across three campuses of the university. Several classification algorithms were tested, including Neural Network, Decision Tree, Random Forest, Gradient Boosting, AdaBoost, Naive Bayes, and Logistic Regression. Model performance was evaluated using metrics such as accuracy, precision, recall, F1-score, and the area under the ROC curve (AUC), along with model interpretability through the SHAP technique. The results indicated strong predictive performance across the models, with Gradient Boosting demonstrating the best overall results. The most influential predictors of dropout were cumulative grade point average and the number of course failures. The findings suggest that machine learning models can support the early identification of at-risk students and contribute to institutional retention policies.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper