What question did this study set out to answer?

The study aims to develop and validate machine learning models for predicting occupational stress in teachers based on self-reported data.

April 12, 2026Open Access

Development and validation of a machine learning model for predicting occupational stress among primary and secondary school teachers

Key Points

The study aims to develop and validate machine learning models for predicting occupational stress in teachers based on self-reported data.
Conducted a cross-sectional study with 2,832 in-service teachers
Utilized sociodemographic, work-related, and lifestyle factors to model occupational stress
Compared six machine learning models including Extreme Gradient Boosting and Logistic Regression
Evaluated model performance using various metrics including accuracy and F1-score
Applied SHapley Additive exPlanations for model interpretation
33.3% prevalence of occupational stress among surveyed teachers
Extreme Gradient Boosting showed best performance with an AUC of 0.620 and accuracy of 0.603
Identified significant predictors included weekly exercise time, sex, and age
Developed a user-friendly web-based tool from the optimal model

Abstract

Background To address the limitations of traditional linear tools in predicting teacher occupational stress, this study aimed to develop and validate machine learning models using easily obtainable, self-reported data. Method A cross-sectional study of 2,832 in-service teachers in Lanzhou, China, was conducted. The presence of occupational stress, defined by the Core Occupational Stress Scale, was modeled using sociodemographic, work-related, and lifestyle factors. The dataset was partitioned into training (80%) and validation (20%) sets to compare six machine learning models: Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), a Backpropagation Neural Network, Elastic Net, Logistic Regression, and a Support Vector Machine. Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve, accuracy, F1-score, and Decision Curve Analysis. The optimal model was interpreted using the SHapley Additive exPlanations method. Results The prevalence of occupational stress was 33.3%. On the validation set, the Extreme Gradient Boosting model demonstrated the best performance, with an Area Under the Curve of 0.620, an accuracy of 0.603, and an F1-score of 0.682. Decision Curve Analysis confirmed this model provided the highest net benefit. The LightGBM and Neural Network models exhibited significant overfitting. SHapley Additive exPlanations analysis identified weekly exercise time, sex, and age as the most influential predictors. A user-friendly, web-based tool was developed from the final model. Conclusion Machine learning, particularly the Extreme Gradient Boosting algorithm, can effectively predict occupational stress in teachers. This approach offers a promising tool for early identification, enabling targeted interventions.

Bookmark

View Full Paper