What question did this study set out to answer?

This research aims to assess the effectiveness of large language models in automating eligibility screening for clinical trials in oncology.

April 5, 2026

Abstract 2739: Evaluation of large language models for automated clinical trial matching in oncology.

Key Result

The gpt-oss20b and gpt-oss120b large language models demonstrated strong reliability (Cohen's kappa >0.8) in automating oncology clinical trial eligibility screening for explicit criteria.

Key Points

This research aims to assess the effectiveness of large language models in automating eligibility screening for clinical trials in oncology.
Evaluated 6 different large language models for their ability to determine clinical trial eligibility.
Extracted data from patient medical records related to trial matches.
Analyzed models' binary responses, confidence scores, and reasoning excerpts.
Assessed concordance between model outputs and actual eligibility criteria.
gpt-oss20b and gpt-oss120b showed high agreement on eligibility for well-documented criteria.
Confidence scores for these models averaged above 0.90.
gpt-oss120b outperformed in handling ambiguous cases with missing data.
Concordance metrics suggested strong reliability, particularly for explicit criteria.
Other models generally provided poorer quality responses and lacked coherence in structured formats.

Structured PICO

Do large language models accurately automate clinical trial eligibility screening in oncology?

Population

Patient medical records with known trial matches in oncology

Intervention

Automated clinical trial eligibility screening using 6 Large Language Models (llama3.2:3b, llama3.3:70b, medgemma_27b_text_it, deepseek-r1:8b, gpt-oss20b, gpt-oss120b) across 19 key questions

Comparator

Known trial matches

Outcome

Concordance between models and interpretability of outputs for clinical trial eligibility determination

Advanced LLMs like gpt-oss20b and gpt-oss120b show high concordance for explicit oncology trial eligibility criteria, offering a scalable approach to automate patient-trial matching.

Main Result

Absolute Event Rate: 0% vs 0%

Abstract

Abstract Background: Efficient patient-trial matching remains a critical challenge in oncology, complicated by heterogeneous documentation, missing data, and complex eligibility criteria. Large Language Models (LLMs) offer potential to automate eligibility screening by interpreting unstructured clinical notes and biomarker data. Methods: We evaluated 6 models: llama3. 2: 3b, llama3. 3: 70b, medgemma₂7bₜextᵢt, deepseek-r1: 8b, gpt-oss20b and gpt-oss120b for clinical trial eligibility determination across 19 key questions reflecting common eligibility criteria from oncology clinical trials. Data were extracted from patient medical records with known trial matches, and models’ binary (yes/no) responses, confidence scores, and reasoning excerpts were analyzed. Concordance between models and interpretability of outputs were assessed. Results: Both gpt-oss20b and gpt-oss120b models demonstrated high agreement on eligibility determinations for well-documented criteria such as measurable disease, ECOG status, age, and tissue availability, with confidence scores commonly above 0. 90. Differences emerged in criteria requiring inference or where documentation was incomplete; gpt-oss120b showed greater confidence and nuanced reasoning in ambiguous cases. Both models flagged missing or unclear data, providing reasoning transparency that supports clinical review. Concordance metrics suggested strong reliability (Cohen’s kappa 0. 8) for explicit criteria, with potential to significantly reduce manual screening burden. The remaining models provided poorer quality responses in general and were unable to respond coherently at all if required to provide that response in a structured format. Conclusions: LLMs can accurately and transparently automate critical components of oncology trial eligibility screening, augmenting manual review processes. Differences in model confidence with uncertain data underscore the need for ongoing refinement and highlight the value of explainable AI in clinical decision support. These findings support integrating LLMs into clinical trial matching workflows to improve trial access and enrollment efficiency. Impact: Automated, interpretable LLM-based clinical trial matching represents a promising advancement toward precision oncology by scaling patient access to tailored therapies and optimizing trial throughput. Citation Format: Aakash Desai, Ellen McNeeley, Sanad Alhuski, Maya Khalil, Matthew Might, Rebecca Arend, Andrew Crouse, Mehmet Akce,. Evaluation of large language models for automated clinical trial matching in oncology abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts) ; 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86 (7 Suppl): Abstract nr 2739.

Bookmark