The rising prevalence of vision-threatening retinal diseases poses a significant burden on the global healthcare systems. Though deep learning (DL) techniques offer promising avenues for improving diagnostic efficiency, data scarcity and imbalance issues persist in training robust diagnostic models, particularly for rare eye diseases. Here, we introduce EyeDiff, a generative foundation model capable of synthesizing lesion-preserving ophthalmic images from textual descriptions. Both objective metrics and expert human evaluations confirmed EyeDiff's ability to generate high-fidelity images across multiple imaging modalities, accurately reflecting textual descriptions of diverse retinal diseases and lesion types. By augmenting minority classes across 11 globally sourced datasets, EyeDiff consistently boosted the diagnostic accuracy for both common and rare eye diseases across different foundation model types, including modality-specific, multimodal and vision-language foundation models trained solely on real data. These results underscore EyeDiff's potential as a general-purpose text-to-image foundation model, offering a scalable and flexible approach to generate balanced, disease-relevant data for advancing retinal disease diagnosis.
Chen et al. (Mon,) studied this question.