What question did this study set out to answer?

The study aimed to evaluate the performance of AI in screening for clinical trial eligibility compared to human experts.

February 25, 2026Open Access

AI-Driven Patient Screening for Clinical Trials in Pancreatic Cancer: The PANCR-AI Pilot Retrospective Comparative Study

Key Points

The study aimed to evaluate the performance of AI in screening for clinical trial eligibility compared to human experts.
Conducted a retrospective cohort review of advanced pancreatic cancer patients.
Screened eligibility criteria for clinical trials using both AI models and human assessments.
Compared the performance of AI with human gold standard using evaluation metrics like sensitivity and specificity.
AI models showed high sensitivity ranging from 83.3% to 92.2%.
Manual screening took significantly longer (44.70 hours) compared to AI (2.53-3.15 hours).
Higher numbers of trials open for enrollment correlated with increased patient inclusion in trials.

Abstract

Abstract Background Screening for clinical trials is challenging for clinicians due to its time-consuming and repetitive nature. The rise of artificial intelligence (AI) offers an opportunity to improve screening productivity and reproducibility. Pancreatic cancer is characterized by increasing incidence, poor survival outcomes, and an urgent need for improved management strategies. Objective This study aimed to assess the performance of AI in evaluating clinical trial inclusion and exclusion criteria, compared to a double-blind human gold standard, using a retrospective cohort. Methods In the PANCR-AI (Pancreatic Cancer Retrospective Screening with Artificial Intelligence) pilot study, we retrospectively reviewed cases from our institutional database of patients with advanced pancreatic cancer presented at tumor board meetings between January 2018 and December 2023. Each patient was screened for clinical trials open for inclusion at the time of the multidisciplinary meeting. Manual screening of eligibility criteria for each patient-trial pair was performed by 2 blinded oncologists to determine potential eligibility (gold standard), with a third oncologist resolving discrepancies. Potential eligibility was also assessed using 3 large language models (ie, GPT-4.5, Claude 3.7 Sonnet, and Mistral-7B-Instruct v0.3). Their performance was compared to the human gold standard using standard evaluation metrics (eg, sensitivity, specificity, precision, recall, and F 1 -score). Correlations between the risk of failure and the number of words and characters in the criteria were analyzed. The time required to complete the screening was recorded for both human and AI assessments. The number of trials open for enrollment at the time of the tumor board meeting was also recorded as a variable for analysis. Results Across 341 patient-trial pairs, the AI models demonstrated high sensitivity, ranging from 83.3% to 92.2%. Analysis of the criteria showed a correlation between the risk of failure and the number of words and the number of characters in the criteria. Overall screening time for manual assessment was significantly longer for the human gold standard (44.70 hours) assessment than for AI (2.53-3.15 hours). Patients were more likely to have been included in a clinical trial if the number of trials open for enrollment was higher at the time of the tumor board meeting ( P =.02). Conclusions Our study highlights the promising performance of AI in clinical trial screening. Future work should explore integration with structured clinical data, such as laboratory values or radiological findings, to improve multimodal comprehension. Expanding the evaluation to a broader range of tumor types and multicenter datasets would improve generalizability. Finally, real-time prospective validation and workflow integration with electronic health records will be critical to assess the feasibility and clinical impact of large language model–assisted screening in daily oncology practice. Addressing these challenges will be essential to move from proof of concept to scalable clinical implementation.

Ask AI

Helpful

Bookmark

View Full Paper