Background To address the limitations of traditional linear tools in predicting teacher occupational stress, this study aimed to develop and validate machine learning models using easily obtainable, self-reported data. Method A cross-sectional study of 2,832 in-service teachers in Lanzhou, China, was conducted. The presence of occupational stress, defined by the Core Occupational Stress Scale, was modeled using sociodemographic, work-related, and lifestyle factors. The dataset was partitioned into training (80%) and validation (20%) sets to compare six machine learning models: Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), a Backpropagation Neural Network, Elastic Net, Logistic Regression, and a Support Vector Machine. Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve, accuracy, F1-score, and Decision Curve Analysis. The optimal model was interpreted using the SHapley Additive exPlanations method. Results The prevalence of occupational stress was 33.3%. On the validation set, the Extreme Gradient Boosting model demonstrated the best performance, with an Area Under the Curve of 0.620, an accuracy of 0.603, and an F1-score of 0.682. Decision Curve Analysis confirmed this model provided the highest net benefit. The LightGBM and Neural Network models exhibited significant overfitting. SHapley Additive exPlanations analysis identified weekly exercise time, sex, and age as the most influential predictors. A user-friendly, web-based tool was developed from the final model. Conclusion Machine learning, particularly the Extreme Gradient Boosting algorithm, can effectively predict occupational stress in teachers. This approach offers a promising tool for early identification, enabling targeted interventions.
Wang et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: