What question did this study set out to answer?

To assess the accuracy of an AI system in automating colorectal cancer registry data abstraction from electronic medical records.

May 30, 2026

Automated abstraction of colorectal cancer registry data using AI: Accuracy and implementation insights.

Key Points

To assess the accuracy of an AI system in automating colorectal cancer registry data abstraction from electronic medical records.
Evaluated Synapsis AI for extracting 33 colorectal cancer registry fields from EMR.
Included adult patients treated at Cleveland Clinic with specific ICD-10 codes, capped at 200 cases.
Compared AI outputs with manually abstracted tumor registry data.
Overall accuracy of 94.78% across all fields, with question-level accuracy of 100% for several key metrics.
Lowest accuracy was 28% for residual colorectal tumor presence due to ambiguity in field definitions.
Accuracy exceeded 72% across all remaining fields, highlighting areas needing improved data access and field alignment.

Abstract

e15516 Background: Cancer registries support surveillance, quality measurement, and research, yet abstraction of high-value clinical variables remains heavily manual. Human registrars must synthesize information across structured fields and unstructured documentation, including pathology, operative reports, imaging, and oncology notes, resulting in time-intensive workflows that are difficult to standardize and scale. Automated chart abstraction of registry variables directly from the electronic medical record (EMR), particularly from unstructured text, could improve efficiency, timeliness, and consistency of registry data capture and reporting. Methods: We evaluated Synapsis AI, an AI-based abstraction system developed by Dyania Health, Inc. to automate extraction of colorectal cancer (CRC) registry variables from the EMR. Adult patients (18–90 years) treated at Cleveland Clinic with the ICD-10 codes C18-C20 first recorded during Q1 2024 were included, capped at 200 cases. The system was instructed to abstract 33 registry fields corresponding to 60 questions per patient, including mainly unstructured data, such as staging, lymph node metrics, biomarkers, treatments, and surgical details, as defined in STORE and SSDI guidelines. AI outputs were compared with registrar-abstracted tumor registry data. Results: Overall accuracy across all fields was 94.78%. Question-level accuracy reached 100% for lymph node yield and positivity, pathologic and post-therapy TNM stage, mismatch repair and microsatellite instability status, immunotherapy regimens, and surgical procedures. The lowest accuracy (28%) occurred for residual colorectal tumor presence. This was primarily due to ambiguity in field definition rather than model error: when no relevant documentation was identified, the system returned “NA,” reflecting a weak negative inference, whereas registry standards required a “No” response. Additional contributors to inaccuracies included incomplete access to imaging reports and insufficient upfront alignment on field definitions and response conventions. Accuracy exceeded 72% across all remaining fields. Conclusions: These findings demonstrate that AI can accurately abstract CRC registry data at scale. Importantly, performance limitations were largely attributable to scoping of fields and access to relevant data, rather than model capability. Proper upfront scoping to align definitions, explicit communication of registrar-developed abstraction conventions, and health-system defined prioritization of sources of truth are critical to achieving registrar-aligned automation. When these elements are addressed, AI systems can function as reliable, scalable extensions of cancer registry workflows.

Mark Helpful

Bookmark

Relay