This study aimed to address the data imbalance problem, which often deteriorates the prediction performance of minority classes in multi-class classification using educational data. To this end, the performance of 48 classification models combining six oversampling techniques (RandomOverSampler, SMOTE, BorderlineSMOTE, ADASYN, SMOTE+ENN, SMOTE+Tomek) and eight machine learning algorithms (CatBoost, XGBoost, LightGBM, RandomForest, ExtraTrees, LogisticRegression, SVM, KNN) was compared. Using data from the Multicultural Adolescents Panel Study (MAPS), adolescents were classified into groups based on their career competency levels. The results revealed that the CatBoost model combined with RandomOverSampler achieved the highest performance across key evaluation metrics, including Accuracy, Macro F1, Macro Recall, and MMCC. Based on this optimal model, the top 10% (18 variables) of important predictors were extracted using SHAP, permutation importance, and impurity-based importance methods, and a visual analysis of predictive contributions and nonlinear relationships for seven core variables commonly identified across all three approaches was conducted. The findings indicated that individual psychological and behavioral factors—such as self-esteem, career decision-making attitudes, preparedness for higher education, and peer relationships—were the most influential predictors of career competency types. This study provides a methodological foundation for early identification of vulnerable groups and the development of tailored career support systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Nayoung Kim
Korean Society for Educational Evaluation
Building similarity graph...
Analyzing shared references across papers
Loading...
Nayoung Kim (Tue,) studied this question.
www.synapsesocial.com/papers/68ebabe3155248a327effc30 — DOI: https://doi.org/10.31158/jeev.2025.38.3.623