Performance and reliability of large language models on the European Board of Hand Surgery examination: a multi-model evaluation study | Synapse