What question did this study set out to answer?

The aim is to assess the effectiveness of large language models in identifying suitable clinical trials for cancer patients with gene mutations.

February 25, 2026Open Access

Enhancing Clinical Trial Selection for Cancer Patients Using Large Language Models

Key Points

The aim is to assess the effectiveness of large language models in identifying suitable clinical trials for cancer patients with gene mutations.
Evaluated two LLMs, GPT-4.0 and Gemini 2.0, against physician-curated benchmarks.
Used trial details from ClinicalTrials.gov linked to specific cancer mutations.
Assessed performance across six specific gene mutations outlined.
Employed decision trees to analyze model interpretability and key indicators.
GPT-4.0 achieved an average F1-score of 64%.
Gemini 2.0 achieved an average F1-score of 70%.
Both models demonstrated potential for improving clinical trial matching.
Decision trees helped identify indicators used by the LLMs for eligibility evaluation.

Abstract

Purpose: Identifying appropriate clinical trials for cancer patients with specific gene mutations remains a significant challenge, largely due to limitations in current search tools like ClinicalTrials.gov, which at times return irrelevant or misleading results. This diagnostic accuracy study investigates the efficacy of 2 large language models (LLMs), GPT-4.0 and Gemini 2.0, in evaluating the eligibility of patients with specific cancer-related gene mutations for clinical trials. Methods: The study prompts GPT 4.0 and Gemini 2.0 with trial details from ClinicalTrials.gov and a particular cancer mutation. We then assess model performance against physician-curated benchmarks across 6 gene mutations (ALK, BRAF, EGFR, ERBB2, KIT, and KRAS). Results: The results demonstrate good F 1-scores for both LLMs—averaging 64% for GPT-4.0 and 70% for Gemini 2.0—highlighting their potential to streamline clinical trial matching. Furthermore, decision trees provided interpretability by identifying key textual indicators that LLMs use. Conclusion: This work demonstrates the feasibility of using proprietary LLMs such as GPT 4.0 and Gemini 2.0 “off the shelf” with both limited LLM fine-tuning and limited patient information to evaluate clinical trial eligibility.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Gandy et al. (Sun,) studied this question.

synapsesocial.com/papers/699e927bf5123be5ed050311 https://doi.org/https://doi.org/10.1177/11769351251399641

Bookmark

View Full Paper