Occupational safety analytics is increasingly moving toward data-driven methodologies; however, existing models often struggle to capture the multidimensional nature of accident causation. This study presents a multimodal Hybrid Transformer-LSTM framework for classifying occupational fatalities by jointly modeling unstructured narratives, cyclical temporal features, and regional spatial indicators. Utilizing a large-scale dataset of 14,914 OSHA fatality records, the proposed architecture leverages BERT-based embeddings for semantic extraction and Bidirectional LSTMs as non-linear pattern encoders for spatiotemporal context. Conceptually grounded in the Swiss Cheese Model, the framework treats different data modalities as proxies for distinct layers of system risk, ranging from proximal unsafe acts to environmental preconditions. Experimental results show that the multimodal architecture achieves an accuracy of 84.56%, representing a 5.33% gain over unimodal BERT baselines. To address the inherent “black-box” nature of deep learning, a SHAP-based explainability framework is incorporated to quantify the contributions of both textual tokens and environmental features to the model’s decision-making process. The results indicate that integrating narrative semantics with temporal and spatial context enhances discriminative performance and enables context-aware classification within a weakly supervised setting. By providing a scalable and interpretable classification framework, this study offers a data-driven decision-support approach for safety professionals and regulatory bodies seeking to implement evidence-based risk management strategies in high-risk industrial sectors.
Esin Ayşe Zaimoğlu (Wed,) studied this question.