What question did this study set out to answer?

The study aimed to classify retinal diseases using optical coherence tomography through deep-learning and transfer-learning techniques.

May 7, 2026

Classification of retinal diseases based on optical coherence tomography angiography using cross-modal transfer learning of domain-specific foundation AI models

Key Points

The study aimed to classify retinal diseases using optical coherence tomography through deep-learning and transfer-learning techniques.
Used the OCTA-500 dataset with 500 subjects split for training/validation and testing.
Applied fivefold cross-validation and intermediate fusion of superficial and deep projections from OCTA.
Evaluated models including Vision-Transformer-VisionFM and RETFound against ImageNet on classification metrics.
VisionFM achieved 81.33% accuracy, outperforming ImageNet's 76.00%.
RETFound reached 80.00% accuracy, also surpassing ImageNet's performance.
Saliency maps localized pathology effectively on OCTA images compared to ImageNet.

Abstract

Aims This study evaluated the use of ophthalmic foundation deep-learning models with cross-modal transfer learning to classify multiple diseases on optical coherence tomography angiography (OCTA) with limited sample size. Methods The OCTA-500 dataset (n=500 subjects) was split into an 85% training/validation set for fivefold cross-validation and a 15% held-out test set. Superficial and deep projections from OCTA were combined using intermediate fusion. Outcomes were multi-disease classification of normal, diabetic retinopathy, age-related macular degeneration and ‘other’. Transfer-learning from colour fundus photography was used to overcome the limitation of small sample sizes. Vision-Transformer-VisionFM and Vision-Transformer-RETFound domain-specific foundation models with cross-modal transfer learning were evaluated. Comparison was made with Vision-Transformer-ImageNet, a non-domain-specific model. Performance was evaluated using accuracy, F1-score, precision, recall and area under the receiver operating characteristic curve. Saliency maps were also computed. Results VisionFM with cross-modal transfer learning outperformed ImageNet in classifying different diseases on OCTA (accuracy: 0.8133±0.0470 vs 0.7600±0.0502). RETFound with cross-modal transfer learning outperformed ImageNet in classifying different diseases on OCTA (accuracy: 0.8000±0.0507 vs 0.7600±0.0521). Similar conclusions were reached with other performance metrics. Saliency maps from VisionFM and RETFound yielded attention patterns that localised pathology to relevant retinal structures on superficial and deep projections from OCTA, comparing favourably with those from ImageNet models. Conclusions Retinal foundation models with cross-modal transfer learning enable accurate multi-class classification using OCTA data, which had small sample size. Results from domain-specific foundation models compared favourably with a non-domain-specific model. Saliency analysis showed attention patterns of pathology localised to anatomically relevant retinal structures.

Mark Helpful

Bookmark

Relay