Measuring The Accuracy and Reproducibility of DeepSeek R1, Claude 3.5 Sonnet, and GPT‑4.1 on Complex Clinical Scenarios | Synapse