Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark | Synapse