Abstract Objective To compare the clinical decision–support performance of 2 large language models (LLMs), ChatGPT-5 and ChatGPT-5 Thinking, with that of experienced clinicians and novices in veterinary theriogenology. Methods 15 standardized obstetric and gynecologic scenarios were independently evaluated by 2 expert clinicians, 2 novice veterinarians, and both LLMs under matched, cold-start conditions. Responses were assessed with a 5-point global quality score by a blinded expert panel. Results ChatGPT-5 Thinking achieved the highest overall quality ratings, followed by ChatGPT-5 and the expert clinicians. Novice veterinarians received the lowest scores. Responses generated by LLM were generally more consistent and complete than those of human readers. Conclusions Within the constraints of a simulated scenario design, LLMs, particularly ChatGPT-5 Thinking, provided clinically appropriate guidance that exceeded novice performance and approached that of expert clinicians. These findings support the potential role of LLMs as adjunct decision-support tools in time-sensitive obstetric and gynecologic cases. Clinical Relevance LLMs may assist clinicians and trainees in managing reproductive emergencies by offering rapid, structured, guideline-aligned recommendations. Further evaluation in real clinical settings is warranted.
Okur et al. (Fri,) studied this question.