Evaluating large language models on medical evidence summarization | Synapse