What question did this study set out to answer?

The aim is to improve predictions of traffic accident severity by extracting deep semantic information from accident narratives.

January 17, 2026Open Access

Traffic Accident Severity Prediction via Large Language Model-Driven Semantic Feature Enhancement

Key Points

The aim is to improve predictions of traffic accident severity by extracting deep semantic information from accident narratives.
Developed a prompt-engineering template for large language models to extract semantic features.
Integrated LLM-derived features with traditional structured data using three strategies: feature concatenation, feature selection, and model-level fusion.
Conducted experiments with 4013 accident records from expressways in Yunnan Province, China.
Models using LLM-derived semantic features outperformed those using only structured features.
The LightGBM model achieved a severe accident recall of 77.8%.
XGBoost showed improved Macro-F1 to 0.6356 with model-level fusion.
Identified a feature dilution effect where high-quality semantic reasoning was hindered by low-quality structured noise.

Abstract

Predicting the severity of traffic accidents remains challenging due to the limited ability of existing methods to extract deep semantic information from unstructured accident narratives, as traditional approaches typically depend on structured data alone. This study proposes a severity prediction approach enhanced by semantic risk reasoning derived from large language models (LLMs). A prompt-engineering template is designed to guide LLMs in extracting proxy semantic features from accident descriptions, forming an enriched feature set that incorporates causal logic. These semantic features are fused with traditional structured features through three integration strategies—direct feature concatenation, optimized feature selection, and model-level fusion. Experiments based on 4013 accident records from expressways in Yunnan Province, China, demonstrate that models using LLM-derived semantic features significantly outperform those relying solely on structured features. Notably, the LightGBM model utilizing semantic features within a balanced learning framework achieves a severe accident recall of 77.8%. While model-level fusion proves optimal for XGBoost (improving Macro-F1 to 0.6356), we identify a “feature dilution” effect in other classifiers, where high-quality semantic reasoning is compromised by low-quality structured noise. These findings indicate that the proposed approach effectively enhances the identification of high-risk accidents and offers a novel semantic-aware solution for traffic safety management. Furthermore, the obtained results provide actionable insights for traffic management agencies to optimize emergency response resource allocation and formulate targeted accident prevention strategies.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper