• Constructed a risk identification framework via K-prototype clustering and Ordered Logit. • Identified a “Kinematic Paradox” where fatigued drivers maintain high trajectory stability. • Defined four risk levels based on kinematic urgency and intervention necessity. • Revealed that lane-changing maneuvers are the primary trigger for extreme risks. • Generated targeted active safety strategies using Large Language Models (LLMs). In accident analysis and prevention, passive vehicle safety technologies ensure the lower limit of driving safety, while active safety technologies determine the upper limit. This study aims to provide suggestions for the active safety management of commercial vehicles by identifying high-risk on-road scenarios. Firstly, taking regular-route passenger buses as the research object, based on multi-source data fusion technology, this study integrates driving alarm data, vehicle trajectory data within 5 min before the alarm, driver video data within 10 s before the alarm, and driving record video data to extract key features and construct a driving risk identification variable set. Secondly, Pearson correlation coefficient and variance inflation factor (VIF) are used sequentially to conduct collinearity tests and eliminate redundant variables. Considering that the variables include both continuous and discrete heterogeneous data, the K-prototype hybrid clustering method is adopted, and the optimal number of clusters (K = 4) is finally determined. Thirdly, an integrated method of ’multi-source heterogeneous data fusion–hybrid variable clustering–Ordered Logit modeling–SHAP interpretability analysis’ is constructed. In an effort to explore active safety technologies, this study attempts to map the identified driving patterns to ordinal risk levels based on key vehicle kinematic parameters. Subsequently, the Ordered Logit model is applied to quantitatively analyze the marginal effects of significant variables. Finally, combined with the variable distribution characteristics of the clustering results and SHAP interpretability analysis, the core features and key incentives of the four risk levels are systematically characterized, and targeted active safety management suggestions are generated with the assistance of Large Language Models (LLMs). This study intends to provide certain insights for the research on vehicle active safety and offer references and suggestions for the dynamic monitoring and management of commercial vehicles.
Xu et al. (Fri,) studied this question.