Predicting the severity of traffic accidents remains challenging due to the limited ability of existing methods to extract deep semantic information from unstructured accident narratives, as traditional approaches typically depend on structured data alone. This study proposes a severity prediction approach enhanced by semantic risk reasoning derived from large language models (LLMs). A prompt-engineering template is designed to guide LLMs in extracting proxy semantic features from accident descriptions, forming an enriched feature set that incorporates causal logic. These semantic features are fused with traditional structured features through three integration strategies—direct feature concatenation, optimized feature selection, and model-level fusion. Experiments based on 4013 accident records from expressways in Yunnan Province, China, demonstrate that models using LLM-derived semantic features significantly outperform those relying solely on structured features. Notably, the LightGBM model utilizing semantic features within a balanced learning framework achieves a severe accident recall of 77.8%. While model-level fusion proves optimal for XGBoost (improving Macro-F1 to 0.6356), we identify a “feature dilution” effect in other classifiers, where high-quality semantic reasoning is compromised by low-quality structured noise. These findings indicate that the proposed approach effectively enhances the identification of high-risk accidents and offers a novel semantic-aware solution for traffic safety management. Furthermore, the obtained results provide actionable insights for traffic management agencies to optimize emergency response resource allocation and formulate targeted accident prevention strategies.
Hao et al. (Thu,) studied this question.