Purpose: Identifying appropriate clinical trials for cancer patients with specific gene mutations remains a significant challenge, largely due to limitations in current search tools like ClinicalTrials.gov, which at times return irrelevant or misleading results. This diagnostic accuracy study investigates the efficacy of 2 large language models (LLMs), GPT-4.0 and Gemini 2.0, in evaluating the eligibility of patients with specific cancer-related gene mutations for clinical trials. Methods: The study prompts GPT 4.0 and Gemini 2.0 with trial details from ClinicalTrials.gov and a particular cancer mutation. We then assess model performance against physician-curated benchmarks across 6 gene mutations (ALK, BRAF, EGFR, ERBB2, KIT, and KRAS). Results: The results demonstrate good F 1-scores for both LLMs—averaging 64% for GPT-4.0 and 70% for Gemini 2.0—highlighting their potential to streamline clinical trial matching. Furthermore, decision trees provided interpretability by identifying key textual indicators that LLMs use. Conclusion: This work demonstrates the feasibility of using proprietary LLMs such as GPT 4.0 and Gemini 2.0 “off the shelf” with both limited LLM fine-tuning and limited patient information to evaluate clinical trial eligibility.
Gandy et al. (Sun,) studied this question.