University dropout represents a significant challenge across European higher education systems, yet most predictive approaches rely on data collected after enrollment has already begun. This study investigates whether dropout risk can be effectively predicted using exclusively pre-enrollment signals — information available at or before the admission stage. Using the UCI Predict Students' Dropout and Academic Success dataset (n=3,630), I train and compare three interpretable machine learning models: Logistic Regression, Decision Tree, and Random Forest. All models are evaluated via 5-fold stratified cross-validation and tested on a held-out set. Random Forest achieves the best AUC-ROC (0.812), while all three models converge to an identical F1-score of 0.66 on the test set, providing empirical support for preferring simpler, more transparent models. SHAP-based feature importance analysis reveals that financial and sociodemographic variables — in particular scholarship status, age at enrollment, and debtor status — are substantially more predictive than prior academic performance. These findings have direct implications for the design of EU AI Act-compliant educational decision-support systems.
A. Cecchi (Fri,) studied this question.