What question did this study set out to answer?

The aim is to develop an efficient framework for clinical data governance that addresses data quality issues and preserves semantic integrity.

June 13, 2026

An Efficient and Reliable Agent-Based System for Clinical Data Governance

Key Points

The aim is to develop an efficient framework for clinical data governance that addresses data quality issues and preserves semantic integrity.
Proposed GovernAgent framework incorporating hierarchical governance and constrained action planning.
Implemented cascading Note- and Section-Level agents to manage data intricacies effectively.
Evaluated on real-world hospital datasets to assess governance effectiveness and efficiency.
GovernAgent improved governance accuracy and efficiency by reducing data quality issues.
Minimized hallucinations associated with text generation, ensuring high semantic fidelity.
Demonstrated adaptability for various downstream clinical applications.

Abstract

Clinical data governance is the cornerstone of reliable intelligent healthcare systems. However, real-world clinical records frequently suffer from complex data quality issues that demand high semantic fidelity and processing efficiency to resolve. Existing section identification and fragmented standardization methods either fail to address these intricate anomalies or inadvertently sacrifice semantic integrity. Meanwhile, directly deploying Large Language Models (LLMs) for this task as free-form text generators introduces hallucinations and computational bottlenecks. To bridge these gaps, we propose GovernAgent, an LLM-driven framework that overcomes these limitations through two core designs. First, inspired by the intrinsic structure of clinical records, our approach introduces a hierarchical governance mechanism. By employing cascading Note- and Section-Level agents, it constrains the governance space in a top-down manner, systematically disentangling these anomalies into resolvable, multi-level quality issues. Second, the framework employs a Constrained Action Planning mechanism. By restricting the LLM to a hybrid "Copy-Generate" action space rather than free-text generation, it maximizes original text reuse, thereby mitigating hallucinations, guaranteeing medical provenance, and ensuring high efficiency. Evaluations on real-world hospital datasets demonstrate that GovernAgent improves governance accuracy and efficiency, minimizes hallucinations, demonstrates high practical adaptability, and empowers downstream clinical applications. Code: https://github.com/kaiyinzhou/GovernAgent.

Bookmark

An Efficient and Reliable Agent-Based System for Clinical Data Governance

Key Points

Abstract

Cite This Study