A statistical framework for evaluating repeatability and reproducibility of large language models in diagnostic reasoning | Synapse