The use of electronic information systems in education has made it possible to identify academic risks in advance through algorithms. This study aims to develop a machine learning-based model to predict students academic status by analysing multidimensional data, such as students economic situation, in order to identify groups of students who may be at risk of dropping out of school. This study uses a public dataset and excludes data that could compromise model training, retaining only two categories of samples for binary classification modelling. The research employs algorithms such as Random Forest, Gradient Boosting and XGBoost in combination with SMOTE oversampling techniques to solve data imbalance problems. Through feature engineering, key indicators such as credit completion rates and grade change rates were created, and the Stacking Ensemble model was used to improve the accuracy of predictions. Experimental results show that the predictive accuracy of the Ensemble model ranges between 85% and 88%, with an average macroeconomic F1 score between 0.85 and 0.88, effectively distinguishing potentially at-risk students and validating the robustness of the model.
Haochen Liu (Wed,) studied this question.