What question did this study set out to answer?

The research aims to quantify risk similarity across transportation construction projects using machine learning.

March 7, 2026Open Access

What drives risk similarity across construction projects? An explainable machine learning analysis using GPT-based embeddings

Key Points

The research aims to quantify risk similarity across transportation construction projects using machine learning.
Developed a framework integrating GPT-based text embeddings and ensemble learning.
Analyzed over 3500 risk items from 72 transportation projects.
Measured semantic similarity between project risk profiles.
Employed explainable AI to identify key factors affecting risk similarity.
Achieved strong predictive performance with ensemble models (R² = 0.85).
Identified risk documentation practices, geographic context, and delivery methods as dominant factors.
Showed that risk knowledge transfer is primarily influenced by context rather than project scale.

Abstract

• Introduces an explainable framework to quantify cross-project risk similarity. • Demonstrates that risk similarity is driven by context, not project scale. • Enables foundation for LLM- and graph-based risk prediction systems. • Uses GPT-based embeddings to capture semantic similarity in risk registers. Risk registers contain rich experiential knowledge, yet existing approaches struggle to systematically compare risks across projects and translate past lessons into actionable insights. This study introduces an explainable, data-driven framework for quantifying cross-project risk similarity in transportation construction by integrating GPT-based text embeddings, ensemble learning, and explainable artificial intelligence (XAI). Using over 3500 risk items from 72 transportation projects, the framework measures semantic similarity between project risk profiles and models how project characteristics shape similarity patterns. Ensemble models achieve strong predictive performance ( R 2 = 0.85 ), while XAI analysis reveals that risk documentation practices, geographic context, and delivery method dominate similarity outcomes, outweighing project scale or project type. These findings demonstrate that transferable risk knowledge is primarily context-driven rather than size-driven. The proposed framework provides a robust foundation for future LLM- and graph-based risk prediction systems, enabling more transparent, scalable, and context-aware risk management in transportation infrastructure projects.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper