Reasoning LLMs demonstrated superior performance in ophthalmology question answering, with DeepSeek-R1 achieving the highest ACC. Our findings demonstrate that reasoning LLM can better simulate human-like thinking processes compared with conventional non-reasoning LLM, suggesting its potential for more trustworthy LLM systems in ophthalmology.
Wang et al. (Tue,) studied this question.