PURPOSE: To validate the performance of an AI system (TRIAGE) for cancer trial eligibility screening using real-world longitudinal electronic health records (EHRs), full versioned trial protocols, and expert clinical research coordinator (CRC) adjudication at the trial and criterion levels. METHODS: This retrospective study of records from August 2017 to April 2025 compared the performance of TRIAGE with real-world enrollment and expert eligibility review of a subset of cases. The train set comprised 148 trials and 628 patients with cancer, resulting in 4,094 patient-trial pairs. The test set comprised 26 trials and 198 patients with breast, lung, and pancreatic cancers, resulting in 820 patient-trial pairs. A stratified random sample of 100 patient-trial pairs (83 patients, 21 trials) underwent manual CRC review and adjudication. The primary outcomes were sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for trial-level eligibility and accuracy for criterion-level decisions. Performance evaluations were based on a binary classification: eligible or potentially eligible versus ineligible. RESULTS: At the predefined trial-level decision threshold for eligibility (0.40), the sensitivity was 78.3%, the specificity was 98.5%, the PPV was 92.6%, and the NPV was 94.8%. A lower threshold (0.13) maximized sensitivity (98.7%), while maintaining high specificity at 97.6%, PPV at 91.2%, and NPV at 99.7%. Criterion-level evaluation across 1,770 adjudications showed 93.1% raw agreement, improving to 94.2% after structured readjudication; 10 of 25 (40%) initial CRC discordances were overturned in favor of TRIAGE. CONCLUSION: TRIAGE accurately determined trial-level eligibility from real-world EHR data with high performance and strong criterion-level agreement for oncology protocols. TRIAGE surfaced potential missed enrollment opportunities and supports adjustable trial-level decision thresholds. Prospective studies of TRIAGE implementation into routine research workflows are ongoing.
Patel et al. (Mon,) studied this question.