What question did this study set out to answer?

This study aims to assess the effectiveness of GraphRAG in addressing incidents related to Lockout/Tagout (LOTO) failures in industrial safety.

March 25, 2026Open Access

Evaluating GraphRAG for industrial safety: a case study on LOTO procedure failures

Key Points

This study aims to assess the effectiveness of GraphRAG in addressing incidents related to Lockout/Tagout (LOTO) failures in industrial safety.
Integrated a Neo4j-based knowledge graph with GPT-4o
Constructed from accident narratives sourced from the OSHA database
Evaluated model performance using a total of 150 questions across six metrics
GraphRAG excelled in tasks aligned with graph structure, like Summarization and Classification
Performance limits noted in cognitively demanding tasks such as Reasoning and Comparison
Highlights the importance of structured semantics for enhancing generation quality

Abstract

Large Language Models (LLMs) are transforming information access and decision support across domains, yet their application in safety-critical settings remains limited by challenges such as hallucination, lack of domain grounding, and interpretability. To address these issues, Graph Retrieval-Augmented Generation (GraphRAG) has emerged as a novel paradigm that integrates LLMs with knowledge graphs, enabling more coherent, faithful, and context-aware outputs through structured semantic retrieval. In this context, this exploratory study explores the application of GraphRAG to the domain of industrial safety, focusing on incidents involving Lockout/Tagout (LOTO) procedure failures. By integrating a Neo4j-based knowledge graph constructed from a set of accident narratives, extracted from the OSHA database, with the generative capabilities of GPT-4o, we assess the system’s ability to produce coherent, complete, and decision-relevant answers grounded in structured safety data. A total of 150 questions, categorized into six task types, were used to evaluate model performance across six metrics: Coherence, Completeness, Empowerment, Faithfulness, F1 Score, and Relevance. The results highlight GraphRAG’s strengths in tasks aligned with graph structure, particularly Summarization, Classification, and Recommendation, while revealing performance limitations in more cognitively demanding tasks such as Reasoning and Comparison. The evaluation underscores the value of structured semantics in enhancing generation quality but also points to scalability and interpretability challenges.

Evaluating GraphRAG for industrial safety: a case study on LOTO procedure failures

Key Points

Abstract

Cite This Study

Also Consider

Also Consider