Traditional road safety analysis is reactive and often hindered by scarce collision data. Traffic conflicts, or near-misses, offer a proactive surrogate for safety assessment, using Extreme Value Theory to extrapolate collision risk from these more frequent events. However, a critical methodological issue is the lack of guidance on prediction reliability. Many studies use short observation periods, yielding predictions with unacceptably wide credible intervals and fostering a misleading impression that such studies are quick. This research demonstrates that because severe conflicts remain rare, hundreds of days of continuous data collection are required for reliable results. This paper systematically assesses the reliability of collision predictions using conflict data with EVT models. The study utilizes a large dataset of traffic conflicts that was collected continuously via LiDAR sensors at four unsignalized intersections in Kitchener, Canada, for periods of up to one year. Using a Peak Over Threshold approach in a Bayesian framework, the analysis evaluates how collision estimates, and their 95% credible intervals converge as data collection increases from two to 365 days. The results demonstrate that while mean collision predictions can stabilize with limited data, the associated credible intervals for short collection periods are so wide that they practically offer no meaningful information. This research concludes that the common practice of using a few days of data is insufficient for reliable safety analysis. It provides an evidence-based methodology for determining the necessary data collection duration, enabling practitioners to balance resource efficiency with the need for robust and reliable proactive safety assessments.
Aminghafouri et al. (Mon,) studied this question.