What question did this study set out to answer?

This research aims to predict university dropout using pre-enrollment data alone.

March 15, 2026Open Access

Predicting University Dropout from Pre-Enrollment Data: A Comparison of Interpretable Machine Learning Models

Key Points

This research aims to predict university dropout using pre-enrollment data alone.
Analyzed pre-enrollment signals from the UCI Predict Students' Dropout dataset (n=3,630)
Compared three interpretable machine learning models: Logistic Regression, Decision Tree, and Random Forest
Utilized 5-fold stratified cross-validation for model evaluation
Random Forest achieved the highest AUC-ROC of 0.812
All models yielded an identical F1-score of 0.66 on the test set
Financial and sociodemographic factors were found to be more predictive than prior academic performance

Abstract

University dropout represents a significant challenge across European higher education systems, yet most predictive approaches rely on data collected after enrollment has already begun. This study investigates whether dropout risk can be effectively predicted using exclusively pre-enrollment signals — information available at or before the admission stage. Using the UCI Predict Students' Dropout and Academic Success dataset (n=3,630), I train and compare three interpretable machine learning models: Logistic Regression, Decision Tree, and Random Forest. All models are evaluated via 5-fold stratified cross-validation and tested on a held-out set. Random Forest achieves the best AUC-ROC (0.812), while all three models converge to an identical F1-score of 0.66 on the test set, providing empirical support for preferring simpler, more transparent models. SHAP-based feature importance analysis reveals that financial and sociodemographic variables — in particular scholarship status, age at enrollment, and debtor status — are substantially more predictive than prior academic performance. These findings have direct implications for the design of EU AI Act-compliant educational decision-support systems.

Predicting University Dropout from Pre-Enrollment Data: A Comparison of Interpretable Machine Learning Models

Key Points

Abstract

Cite This Study