Key points are not available for this paper at this time.
Background/Objectives: General-purpose and domain-specific multimodal foundation models show considerable promise in medical image analysis. In this study, we evaluated the classification accuracy of diabetic retinopathy vs. normal fundus images using general-purpose conversational models (Gemini 3 Flash, GPT-5.2, and Pixtral-Large), a medical conversational model (MedGemma-1.5), and its image-encoder (MedSigLIP), as well as ophthalmology-specific models (RETFound and EyeCLIP). Methods: We applied zero-/few-shot to general-purpose conversational models, linear probing, and fine-tuning approaches to domain-specific models for evaluation purposes. Results: We found that the zero-shot accuracies for Pixtral-Large (70.7%) and fine-tuned RETFound (77.1%) were comparable but lower than those of GPT-5.2 (77.9%), MedGemma-1.5 (88.2%), and Gemini 3 (88.5%) as well as the fine-tuned EyeCLIP (85.8%) and MedSigLIP (94.8%). The accuracy gains from few-shot prompting were substantial for Pixtral-Large (+7.4%) but were limited for GPT-5.2 (+3.6%), Gemini 3 (−3.4%), and MedGemma-1.5 (−1.1%). Embedding-based linear probing further improved accuracy over fine-tuning for RETFound (+9.7%) and yielded only marginal gains for EyeCLIP (+2.3%) but did not benefit MedSigLIP (−0.8%). Overall, with minimal prompting enhancement, general-purpose conversational models such as Gemini 3 and GPT-5.2 achieved performance comparable to ophthalmology-specific models that were either fine-tuned or enhanced via embedding-based linear probing, but remained inferior to MedSigLIP and its conversational counterpart, MedGemma-1.5. Conclusions: The findings highlight a trade-off between specialization and flexibility, where domain-specific models provide higher accuracy and stability, while general-purpose multimodal models offer greater accessibility, adaptability, and interactive reasoning, serving as complementary tools for retinal disease screening and clinical decision support.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mohammad Iqbal Nouyed
Mohammad Al-Mamun
Donald Adjeroh
Diagnostics
West Virginia University
Binghamton University
West Virginia University Hospitals
Building similarity graph...
Analyzing shared references across papers
Loading...
Nouyed et al. (Fri,) studied this question.
www.synapsesocial.com/papers/6a095bef7880e6d24efe1d3d — DOI: https://doi.org/10.3390/diagnostics16101504