e15516 Background: Cancer registries support surveillance, quality measurement, and research, yet abstraction of high-value clinical variables remains heavily manual. Human registrars must synthesize information across structured fields and unstructured documentation, including pathology, operative reports, imaging, and oncology notes, resulting in time-intensive workflows that are difficult to standardize and scale. Automated chart abstraction of registry variables directly from the electronic medical record (EMR), particularly from unstructured text, could improve efficiency, timeliness, and consistency of registry data capture and reporting. Methods: We evaluated Synapsis AI, an AI-based abstraction system developed by Dyania Health, Inc. to automate extraction of colorectal cancer (CRC) registry variables from the EMR. Adult patients (18–90 years) treated at Cleveland Clinic with the ICD-10 codes C18-C20 first recorded during Q1 2024 were included, capped at 200 cases. The system was instructed to abstract 33 registry fields corresponding to 60 questions per patient, including mainly unstructured data, such as staging, lymph node metrics, biomarkers, treatments, and surgical details, as defined in STORE and SSDI guidelines. AI outputs were compared with registrar-abstracted tumor registry data. Results: Overall accuracy across all fields was 94.78%. Question-level accuracy reached 100% for lymph node yield and positivity, pathologic and post-therapy TNM stage, mismatch repair and microsatellite instability status, immunotherapy regimens, and surgical procedures. The lowest accuracy (28%) occurred for residual colorectal tumor presence. This was primarily due to ambiguity in field definition rather than model error: when no relevant documentation was identified, the system returned “NA,” reflecting a weak negative inference, whereas registry standards required a “No” response. Additional contributors to inaccuracies included incomplete access to imaging reports and insufficient upfront alignment on field definitions and response conventions. Accuracy exceeded 72% across all remaining fields. Conclusions: These findings demonstrate that AI can accurately abstract CRC registry data at scale. Importantly, performance limitations were largely attributable to scoping of fields and access to relevant data, rather than model capability. Proper upfront scoping to align definitions, explicit communication of registrar-developed abstraction conventions, and health-system defined prioritization of sources of truth are critical to achieving registrar-aligned automation. When these elements are addressed, AI systems can function as reliable, scalable extensions of cancer registry workflows.
Lindberg et al. (Thu,) studied this question.