What question did this study set out to answer?

The aim is to develop an efficient and interpretable machine learning framework for predicting undergraduate academic outcomes.

February 12, 2026Open Access

Efficient and Interpretable Machine Learning for Student Academic Outcome Prediction

Key Points

The aim is to develop an efficient and interpretable machine learning framework for predicting undergraduate academic outcomes.
Utilized a dataset of 4424 student records from 17 programs.
Employed recursive feature elimination with Gradient Boosting and Random Forest.
Focused on reducing dimensionality by retaining 20 informative predictors.
Integrated SHAP for model interpretability and feature contribution analysis.
Gradient Boosting model achieved an AUC of 0.891, indicating high predictive accuracy.
Identified key factors influencing outcomes: academic engagement, course approval, payment status, and enrollment age.
Demonstrated that a compact feature set can maintain high classification performance while enhancing interpretability.

Abstract

Understanding and preventing student dropout presents a decision-critical modeling problem involving heterogeneous variables, nonlinear relationships, and the need for transparent inference. This study addresses the prediction of undergraduate academic outcomes, including Graduation, Enrolled, and Dropout, by proposing a efficientand interpretable machine learning framework that explicitly balances predictive performance, feature efficiency, and algorithmic explainability. The empirical analysis relies on a dataset of 4424 student records across 17 undergraduate programs from the Polytechnic Institute of Portalegre, Portugal. In contrast to existing approaches that rely on high-dimensional input spaces and opaque predictive architectures, we develop a reduced-dimensional classification pipeline based on recursive feature elimination with Gradient Boosting and Random Forest models. Starting from a comprehensive set of demographic, academic, and financial indicators, only 20 informative predictors are retained for model construction, substantially reducing input complexity while preserving predictive capacity. Comparative evaluation across multiple learning algorithms identifies Gradient Boosting as the most effective model, achieving an AUC of 0.891. Beyond predictive accuracy, the proposed framework emphasizes model interpretability through the integration of SHapley Additive exPlanations (SHAP), enabling quantitative attribution of feature contributions at both global and instance levels. The analysis reveals that second-semester academic engagement variables—including the number of courses approved, evaluated, and enrolled—as well as tuition fee payment status and age at enrollment, are the dominant factors shaping student outcomes. Overall, the results demonstrate that strong classification performance can be achieved using a compact feature set while maintaining transparent and explainable model behavior. By combining mathematically grounded feature selection with principled model explanation, this study advances methodological understanding of how efficiency, interpretability, and predictive accuracy can be jointly optimized in applied machine learning, with implications for decision-support systems in educational analytics.

Bookmark

View Full Paper

Bookmark

View Full Paper

Efficient and Interpretable Machine Learning for Student Academic Outcome Prediction

Key Points

Abstract

Cite This Study