What type of study is this?

This is a Quantitative Study study (also classified as: Cohort Study).

October 12, 2025

Comparative Study on the Performance of Imbalanced Multi-Class Classification Using Oversampling and Machine Learning

Key Points

The CatBoost model with RandomOverSampler achieved superior performance across multiple evaluation metrics.
Key evaluation metrics included Accuracy, Macro F1, and Macro Recall, indicating effective classification.
The study utilized data from the Multicultural Adolescents Panel Study, focusing on career competency levels among adolescents.
Findings emphasize the need for tailored support systems based on individual predictors like self-esteem and peer relationships.

Abstract

This study aimed to address the data imbalance problem, which often deteriorates the prediction performance of minority classes in multi-class classification using educational data. To this end, the performance of 48 classification models combining six oversampling techniques (RandomOverSampler, SMOTE, BorderlineSMOTE, ADASYN, SMOTE+ENN, SMOTE+Tomek) and eight machine learning algorithms (CatBoost, XGBoost, LightGBM, RandomForest, ExtraTrees, LogisticRegression, SVM, KNN) was compared. Using data from the Multicultural Adolescents Panel Study (MAPS), adolescents were classified into groups based on their career competency levels. The results revealed that the CatBoost model combined with RandomOverSampler achieved the highest performance across key evaluation metrics, including Accuracy, Macro F1, Macro Recall, and MMCC. Based on this optimal model, the top 10% (18 variables) of important predictors were extracted using SHAP, permutation importance, and impurity-based importance methods, and a visual analysis of predictive contributions and nonlinear relationships for seven core variables commonly identified across all three approaches was conducted. The findings indicated that individual psychological and behavioral factors—such as self-esteem, career decision-making attitudes, preparedness for higher education, and peer relationships—were the most influential predictors of career competency types. This study provides a methodological foundation for early identification of vulnerable groups and the development of tailored career support systems.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Nayoung Kim

Journals

Korean Society for Educational Evaluation

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Comparative Study on the Performance of Imbalanced Multi-Class Classification Using Oversampling and Machine Learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study