March 3, 2026Open Access

Automatic processing of gastrointestinal endoscopy referrals and patient instructions using large language models

Puntos clave

Accurate processing of gastrointestinal endoscopy referrals shows potential to enhance clinical workflows, reducing manual effort.
Model o3 achieved 91%-100% accuracy while Gemini 2.5-pro achieved 89%-99%, confirming high performance across assessment variables.
Confusion matrix analysis revealed high precision and specificity, indicating substantial reliability for referral logistics.
These findings imply large language models could reduce the workload on physicians while improving patient communications.

Resumen

Open-access endoscopy relies on referrals that are manually vetted, which is a resource consuming process, with potential biases. While Large Language Models (LLMs) have demonstrated potential in medical utilities, their ability to autonomously manage complex referral logistics remains understudied. We assessed whether LLMs can provide accurate recommendations on gastrointestinal endoscopy referrals. We extracted 200 multilingual endoscopy referrals with structured and unstructured medical data. We evaluated OpenAI’s o3 and Google’s Gemini 2.5-pro. A prompt was tuned on a set of 20 referrals and tested on the remaining 180 referrals. Eight variables were tested: procedure type, indication, need for anesthesiologist, omission of anti-aggregants, anti-coagulants and glucagon-like peptide-1 receptor agonists (GLP-1RAs), implantable electronic devices and need for intensified preparation. LLM responses were benchmarked against expert gastroenterologists. Accuracy and F1 scores were analyzed using bootstrapping, and models compared with McNemar’s test. Confusion matrices were calculated. Additionally, o3 generated patient-specific visual timelines. Among 200 referrals, 88 (44%) referred for colonoscopy, 53 (26.5%) for esophagogastroduodenoscopy; 65 (32.5%) required an anesthesiologist and 65 (32.5%) intensified preparation. Both models demonstrated comparable high performance, with o3 achieving 91%–100% accuracy and Gemini 2.5-pro achieving 89%–99% accuracy across all variables. There were no statistically significant differences between the models. Confusion matrix analysis confirmed high precision (> 95%) and specificity (> 91%) for both, indicating high reliability in resource allocation. Additionally, o3 successfully generated accurate, patient-specific visual instructions for all sampled cases. LLMs are highly accurate in processing endoscopy referrals and can generate patient-specific instructions. These tools offer a promising solution to streamline endoscopy workflows, reduce physician burden, and improve patient communication.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo