• Introduces an explainable framework to quantify cross-project risk similarity. • Demonstrates that risk similarity is driven by context, not project scale. • Enables foundation for LLM- and graph-based risk prediction systems. • Uses GPT-based embeddings to capture semantic similarity in risk registers. Risk registers contain rich experiential knowledge, yet existing approaches struggle to systematically compare risks across projects and translate past lessons into actionable insights. This study introduces an explainable, data-driven framework for quantifying cross-project risk similarity in transportation construction by integrating GPT-based text embeddings, ensemble learning, and explainable artificial intelligence (XAI). Using over 3500 risk items from 72 transportation projects, the framework measures semantic similarity between project risk profiles and models how project characteristics shape similarity patterns. Ensemble models achieve strong predictive performance ( R 2 = 0.85 ), while XAI analysis reveals that risk documentation practices, geographic context, and delivery method dominate similarity outcomes, outweighing project scale or project type. These findings demonstrate that transferable risk knowledge is primarily context-driven rather than size-driven. The proposed framework provides a robust foundation for future LLM- and graph-based risk prediction systems, enabling more transparent, scalable, and context-aware risk management in transportation infrastructure projects.
Erfani et al. (Wed,) studied this question.