March 3, 2026Open Access

Evaluating reasoning large language models with human-like thinking in ophthalmic question answering

Key Points

Reasoning large language models achieved superior accuracy in ophthalmic question answering, indicating their effectiveness.
DeepSeek-R1 stood out with the highest accuracy score, showcasing the potential of LLMs to simulate human-like thinking.
The analysis compared reasoning LLMs with conventional non-reasoning models to highlight their advantages in simulated thinking processes.
These findings suggest that integrating reasoning capabilities may enable more trustworthy systems in the field of ophthalmology.

Abstract

Reasoning LLMs demonstrated superior performance in ophthalmology question answering, with DeepSeek-R1 achieving the highest ACC. Our findings demonstrate that reasoning LLM can better simulate human-like thinking processes compared with conventional non-reasoning LLM, suggesting its potential for more trustworthy LLM systems in ophthalmology.

Bookmark

View Full Paper

Cite This Study

Wang et al. (Tue,) studied this question.

synapsesocial.com/papers/69a75b46c6e9836116a2258d https://doi.org/https://doi.org/10.1136/bmjophth-2025-002615

Bookmark

View Full Paper