Abstract Clinicians spend significant time reviewing medical images and transcribing findings. By integrating visual and textual data, foundation models have the potential to reduce workloads and boost efficiency, yet their practical clinical value remains uncertain. In this study, we find that OpenAI’s ChatGPT-4o and two medical vision-language models (VLMs) significantly underperform ophthalmologists in key tasks for age-related macular degeneration (AMD). To address this, we developed a dedicated training curriculum, designed by domain specialists, to optimize VLMs for tasks related to clinical decision making. The resulting model, RetinaVLM-Specialist, significantly outperforms foundation medical VLMs and ChatGPT-4o in AMD disease staging (F1: 0.63 vs. 0.33) and referral (0.67 vs. 0.50), achieving performance comparable to junior ophthalmologists. In a reader study, two senior ophthalmologists confirmed that RetinaVLM’s reports were substantially more accurate than those written by ChatGPT-4o (64.3% vs. 14.3%). Overall, our curriculum-based approach offers a blueprint for adapting foundation models to real-world medical applications.
Building similarity graph...
Analyzing shared references across papers
Loading...
Robbie Holland
Thomas R. Taylor
Christopher Holmes
npj Digital Medicine
University of Michigan
University College London
Imperial College London
Building similarity graph...
Analyzing shared references across papers
Loading...
Holland et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68af474ead7bf08b1ead3a81 — DOI: https://doi.org/10.1038/s41746-025-01893-8