Same-level falls are the most frequent occupational accidents, yet traditional manual analysis of accident reports is labor-intensive and limits large-scale prevention strategies. In this pilot study, we aimed to evaluate the accuracy of using large language models (LLMs) to automate the classification of occupational accident text data without task-specific pretraining. We analyzed data from 2619 same-level-fall-related injury cases, using expert manual classification as the reference standard. Four models—GPT-4o mini, GPT-4.1 mini, GPT-4.1, and o4-mini—were compared using accuracy and Cohen’s kappa. The o4-mini model demonstrated the highest performance, showing statistical superiority in the complex “causal agent” category with 72.8% accuracy. For other classification tasks, the top models achieved accuracies of 82–92%, with Cohen’s kappa coefficients > 0.7, indicating substantial agreement with expert judgments. These findings suggest that LLMs can classify occupational accident text with substantial agreement with the expert-derived reference standard in this dataset. This automated approach enables efficient, high-frequency analysis of large datasets, offering a promising tool for large-scale occupational accident surveillance and screening.
Ando et al. (Fri,) studied this question.