Open-access endoscopy relies on referrals that are manually vetted, which is a resource consuming process, with potential biases. While Large Language Models (LLMs) have demonstrated potential in medical utilities, their ability to autonomously manage complex referral logistics remains understudied. We assessed whether LLMs can provide accurate recommendations on gastrointestinal endoscopy referrals. We extracted 200 multilingual endoscopy referrals with structured and unstructured medical data. We evaluated OpenAI’s o3 and Google’s Gemini 2.5-pro. A prompt was tuned on a set of 20 referrals and tested on the remaining 180 referrals. Eight variables were tested: procedure type, indication, need for anesthesiologist, omission of anti-aggregants, anti-coagulants and glucagon-like peptide-1 receptor agonists (GLP-1RAs), implantable electronic devices and need for intensified preparation. LLM responses were benchmarked against expert gastroenterologists. Accuracy and F1 scores were analyzed using bootstrapping, and models compared with McNemar’s test. Confusion matrices were calculated. Additionally, o3 generated patient-specific visual timelines. Among 200 referrals, 88 (44%) referred for colonoscopy, 53 (26.5%) for esophagogastroduodenoscopy; 65 (32.5%) required an anesthesiologist and 65 (32.5%) intensified preparation. Both models demonstrated comparable high performance, with o3 achieving 91%–100% accuracy and Gemini 2.5-pro achieving 89%–99% accuracy across all variables. There were no statistically significant differences between the models. Confusion matrix analysis confirmed high precision (> 95%) and specificity (> 91%) for both, indicating high reliability in resource allocation. Additionally, o3 successfully generated accurate, patient-specific visual instructions for all sampled cases. LLMs are highly accurate in processing endoscopy referrals and can generate patient-specific instructions. These tools offer a promising solution to streamline endoscopy workflows, reduce physician burden, and improve patient communication.
Gorelik et al. (Tue,) studied this question.