What type of study is this?

August 19, 2025Open Access

Specialized curricula for training vision language models in retinal image analysis

Key Points

RetinaVLM-Specialist significantly outperforms existing vision-language models in analyzing retinal images for AMD.
Performance metrics showed F1 scores of 0.63 for RetinaVLM-Specialist in disease staging, compared to 0.33 for competitors.
The dedicated training curriculum was developed by domain specialists to address specific clinical applications.
This approach offers a framework for effectively adapting foundation models to real-world medical needs.

Abstract

Abstract Clinicians spend significant time reviewing medical images and transcribing findings. By integrating visual and textual data, foundation models have the potential to reduce workloads and boost efficiency, yet their practical clinical value remains uncertain. In this study, we find that OpenAI’s ChatGPT-4o and two medical vision-language models (VLMs) significantly underperform ophthalmologists in key tasks for age-related macular degeneration (AMD). To address this, we developed a dedicated training curriculum, designed by domain specialists, to optimize VLMs for tasks related to clinical decision making. The resulting model, RetinaVLM-Specialist, significantly outperforms foundation medical VLMs and ChatGPT-4o in AMD disease staging (F1: 0.63 vs. 0.33) and referral (0.67 vs. 0.50), achieving performance comparable to junior ophthalmologists. In a reader study, two senior ophthalmologists confirmed that RetinaVLM’s reports were substantially more accurate than those written by ChatGPT-4o (64.3% vs. 14.3%). Overall, our curriculum-based approach offers a blueprint for adapting foundation models to real-world medical applications.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Robbie Holland

Thomas R. Taylor

Christopher Holmes

Journals

npj Digital Medicine

Actions

Institutions

University of Michigan

University College London

Imperial College London

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Specialized curricula for training vision language models in retinal image analysis

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study