Synthetic benchmark validation does not transfer to real-data behavior for the tested prompt-level safety operationalizations in clinical LLMs: a multi-model multi-institution evaluation | Synapse