Aims This study evaluated the use of ophthalmic foundation deep-learning models with cross-modal transfer learning to classify multiple diseases on optical coherence tomography angiography (OCTA) with limited sample size. Methods The OCTA-500 dataset (n=500 subjects) was split into an 85% training/validation set for fivefold cross-validation and a 15% held-out test set. Superficial and deep projections from OCTA were combined using intermediate fusion. Outcomes were multi-disease classification of normal, diabetic retinopathy, age-related macular degeneration and ‘other’. Transfer-learning from colour fundus photography was used to overcome the limitation of small sample sizes. Vision-Transformer-VisionFM and Vision-Transformer-RETFound domain-specific foundation models with cross-modal transfer learning were evaluated. Comparison was made with Vision-Transformer-ImageNet, a non-domain-specific model. Performance was evaluated using accuracy, F1-score, precision, recall and area under the receiver operating characteristic curve. Saliency maps were also computed. Results VisionFM with cross-modal transfer learning outperformed ImageNet in classifying different diseases on OCTA (accuracy: 0.8133±0.0470 vs 0.7600±0.0502). RETFound with cross-modal transfer learning outperformed ImageNet in classifying different diseases on OCTA (accuracy: 0.8000±0.0507 vs 0.7600±0.0521). Similar conclusions were reached with other performance metrics. Saliency maps from VisionFM and RETFound yielded attention patterns that localised pathology to relevant retinal structures on superficial and deep projections from OCTA, comparing favourably with those from ImageNet models. Conclusions Retinal foundation models with cross-modal transfer learning enable accurate multi-class classification using OCTA data, which had small sample size. Results from domain-specific foundation models compared favourably with a non-domain-specific model. Saliency analysis showed attention patterns of pathology localised to anatomically relevant retinal structures.
Shah et al. (Tue,) studied this question.