Visual foundation models have advanced ocular disease diagnosis, yet providing interpretable explanations remains challenging. We evaluate multimodal LLMs for generating explanations of ocular diagnoses, combining Vision Transformer-derived saliency maps with clinical metadata. After finetuning RETFound for improved performance on the BRSET dataset (AUC-ROC 0.9664/0.8611 for diabetic retinopathy/glaucoma), we compared five LLMs through technical and clinical evaluations. GPT-o1 demonstrated superior performance across technical dimensions and clinical metrics (79.32% precision, 77.18% recall, 78.25% F1, 20.68% hallucination rate). Our findings highlight the importance of underlying diagnostic accuracy and advanced model architecture for generating reliable clinical explanations, suggesting opportunities for integrated verification mechanisms in future developments. The code and details can be found at: https://github.com/YatingPan/ocular-llm-explainability.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yating Pan
Janna Hastings
University of Zurich
SIB Swiss Institute of Bioinformatics
University of St.Gallen
Building similarity graph...
Analyzing shared references across papers
Loading...
Pan et al. (Thu,) studied this question.
www.synapsesocial.com/papers/689dfe97d61984b91e13bff0 — DOI: https://doi.org/10.3233/shti250910
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: