Employee turnover presents a significant challenge to modern organizations, often resulting in operational disruptions, substantial hiring costs, and a loss of institutional knowledge. While traditional human resource practices have historically been reactive, the emergence of machine learning has introduced a proactive capability to anticipate and mitigate attrition before it occurs. This research utilizes the IBM HR Analytics dataset, which contains 1470 employee records and 35 distinct features, to develop a hybrid machine learning model designed to enhance the accuracy of turnover predictions. To ensure the model’s effectiveness, the researchers employed a comprehensive preprocessing phase that included eliminating non-informative features, applying label encoding to categorical data, and using StandardScaler to normalize quantitative values. A critical component of the study addressed the common issue of class imbalance within HR data. To resolve this, a hybrid sampling strategy was implemented, combining Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) to create a more balanced learning environment for the algorithms. The core of the predictive engine is a soft voting ensemble that integrates three powerful algorithms: Random Forest, XGBoost, and logistic regression. Evaluated on an 80/20 train–test split, the tuned XGBoost model achieved an impressive 84% accuracy and an Area Under the Curve (AUC) of 0.80. Meanwhile, the logistic regression component contributed the highest F1-score, reinforcing the overall strength and balance of the ensemble approach. These metrics confirm that the hybrid model is both robust and reliable for identifying at-risk employees. Beyond simple prediction, the study prioritized interpretability by using SHapley Additive exPlanations (SHAP) to identify the primary drivers of attrition. The analysis revealed that the most significant variables influencing an employee’s decision to leave include the interaction between job level and experience, frequent overtime, monthly income, current job level, and total years spent at the company. By providing these data-driven insights, the model empowers HR teams to transition from reactive troubleshooting to proactive retention planning, ultimately securing the organization’s talent and stability.
Building similarity graph...
Analyzing shared references across papers
Loading...
Muna I. Alyousef
Hamza Wazir Khan
Mian Usman Sattar
Information
University of Derby
University of Ha'il
Namal College
Building similarity graph...
Analyzing shared references across papers
Loading...
Alyousef et al. (Tue,) studied this question.
www.synapsesocial.com/papers/6996a957ecb39a600b3f0463 — DOI: https://doi.org/10.3390/info17020208
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: