Large Language Models Are Poor Clinical Decision-Makers: A Comprehensive Benchmark | Synapse