Key points are not available for this paper at this time.
• State-of-the-art deep learning models power advanced semantic search in crash narrative analysis. • An adaptive sliding window strategy enables precise scanning of extensive narrative data. • The developed approach achieves up to 96% recall on semantically complex queries. • It outperforms traditional search techniques by 23% to 53.8% in retrieval accuracy. • Automatic highlighting of key segments minimizes manual review effort and reduces errors in crash data analysis. Traffic crash narratives are rich in nuanced, context-specific details that remain underutilized due to the constraints of traditional text analysis methods. The relatively recent advent of advanced language models has so far resulted in limited exploration of crash narratives. We introduce a semantic retrieval framework leveraging Transformer-based embeddings and sliding-window search strategy to rapidly and precisely extract critical insights from extensive crash reports. The method partitions narratives into overlapping segments and generates high-dimensional semantic representations, facilitating precise matching between user queries and narrative content while automatically highlighting the most relevant text segments. Our evaluations demonstrate several key advances. First, the sliding-window approach outperforms conventional document-level, sentence-level, and BM25 methods, achieving recall improvements ranging from 23% to 53.8% on challenging queries. A window size of six was found to consistently yield strong retrieval performance across tasks. Second, a sensitivity analysis indicates that a window-based scanning approach finds a balance between semantic granularity and computational efficiency. Third, validation across different datasets confirms the cross-domain robustness of our framework. Overall, this study establishes a robust methodological foundation for semantic search in traffic safety research. By significantly reducing manual review efforts and enhancing retrieval performance, our approach offers a promising tool for extracting actionable insights from unstructured text data in various domains.
Arteaga et al. (Wed,) studied this question.