Evaluating Large Language Models on Medical Evidence Summarization | Synapse