August 9, 2025

Enhancing Interpretability of Ocular Disease Diagnosis: A Zero-Shot Study of Multimodal Large Language Models.

Key Points

Ocular disease diagnosis is enhanced by using multimodal large language models for interpretability.
The BRSET dataset achieved AUC-ROC scores of 0.9664 for diabetic retinopathy and 0.8611 for glaucoma.
GPT-o1 outperformed other LLMs, achieving precision of 79.32% and recall of 77.18%.
Findings emphasize the need for advanced model architecture to improve diagnostic accuracy and explanation reliability.

Abstract

Visual foundation models have advanced ocular disease diagnosis, yet providing interpretable explanations remains challenging. We evaluate multimodal LLMs for generating explanations of ocular diagnoses, combining Vision Transformer-derived saliency maps with clinical metadata. After finetuning RETFound for improved performance on the BRSET dataset (AUC-ROC 0.9664/0.8611 for diabetic retinopathy/glaucoma), we compared five LLMs through technical and clinical evaluations. GPT-o1 demonstrated superior performance across technical dimensions and clinical metrics (79.32% precision, 77.18% recall, 78.25% F1, 20.68% hallucination rate). Our findings highlight the importance of underlying diagnostic accuracy and advanced model architecture for generating reliable clinical explanations, suggesting opportunities for integrated verification mechanisms in future developments. The code and details can be found at: https://github.com/YatingPan/ocular-llm-explainability.

Ask AI

Helpful

Bookmark

View Full Paper