Performance of large language models on neuroanatomy-based medical riddles: a comparative study

Large language models demonstrate significant performance variability in solving neuroanatomy riddles, highlighting their potential utility.
The average correct response rate of the models was assessed, revealing a wide range in accuracy across different riddles.
Comparative analysis across various models was conducted to identify strengths and weaknesses in solving medical riddles.
Results may inform future developments in using language models for clinical applications in neuroanatomy.

Bookmark

Cite This Study