Automating expert-level medical reasoning evaluation of large language models | Synapse