What question did this study set out to answer?

To establish a leakage-aware evaluation protocol for predicting employee attrition using imbalanced data.

March 25, 2026Open Access

Leakage-Free Evaluation for Employee Attrition Prediction on Tabular Data

Key Points

To establish a leakage-aware evaluation protocol for predicting employee attrition using imbalanced data.
Proposed a reproducible evaluation protocol for employee attrition prediction.
Implemented SMOTE only within training set during stratified 5-fold cross-validation.
Applied one-hot encoding consistently on the dataset.
Evaluated models including Logistic Regression, Random Forest, and XGBoost using imbalance-aware metrics.
XGBoost achieved the highest mean Average Precision of 0.556 ± 0.056 in cross-validation.
Logistic Regression attained the highest mean F1 score of 0.439 ± 0.048.
LightGBM showed the best mean ROC-AUC of 0.791 ± 0.026.
On the test set, XGBoost delivered precision of 0.65 and recall of 0.45.

Abstract

In the context of employee attrition prediction using imbalanced tabular data, we propose a reproducible, leakage-aware evaluation protocol and validate it on the IBM HR Attrition dataset. We perform the train/test split prior to any rebalancing; SMOTE (Synthetic Minority Over-sampling Technique) is applied exclusively within the training portion of each fold in stratified 5-fold cross-validation, while the test set remains untouched. One-Hot Encoding is performed consistently using pd. getdummies. We benchmark Logistic Regression, Random Forest, ExtraTrees, LightGBM, and XGBoost using imbalance-aware metrics: F1 for the minority class, PR-AUC reported as Average Precision (AP), and ROC-AUC reported both in cross-validation and on the held-out test set. XGBoost attains the best mean AP in cross-validation (0. 556 ± 0. 056). Logistic Regression achieves the highest mean F1 (0. 439 ± 0. 048), while LightGBM yields the best mean ROC-AUC (0. 791 ± 0. 026). On the test set, XGBoost achieves a precision value of 0. 65 and a recall value of 0. 45 at a fixed threshold of 0. 5. Overall, the results highlight a trade-off between stable minority-class detection (Logistic Regression) and stronger risk ranking performance (boosting models) under class imbalance.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Căvescu et al. (Mon,) studied this question.

synapsesocial.com/papers/69c37be2b34aaaeb1a67ebef https://doi.org/https://doi.org/10.3390/info17030308

Bookmark

View Full Paper