What question did this study set out to answer?

To compare the decision-support performance of large language models with experienced clinicians and novices in veterinary theriogenology.

February 17, 2026

Performance of large language models versus clinicians and novices in veterinary theriogenology decision support

Key Points

To compare the decision-support performance of large language models with experienced clinicians and novices in veterinary theriogenology.
Evaluated 15 standardized obstetric and gynecologic scenarios
Participants included 2 expert clinicians, 2 novice veterinarians, and both LLMs
Responses assessed with a 5-point global quality score by a blinded panel
ChatGPT-5 Thinking received the highest quality ratings
ChatGPT-5 followed closely behind expert clinicians
Novice veterinarians scored the lowest
LLM responses were more consistent and complete
LLMs provided guidance that approached expert-level support

Abstract

Abstract Objective To compare the clinical decision–support performance of 2 large language models (LLMs), ChatGPT-5 and ChatGPT-5 Thinking, with that of experienced clinicians and novices in veterinary theriogenology. Methods 15 standardized obstetric and gynecologic scenarios were independently evaluated by 2 expert clinicians, 2 novice veterinarians, and both LLMs under matched, cold-start conditions. Responses were assessed with a 5-point global quality score by a blinded expert panel. Results ChatGPT-5 Thinking achieved the highest overall quality ratings, followed by ChatGPT-5 and the expert clinicians. Novice veterinarians received the lowest scores. Responses generated by LLM were generally more consistent and complete than those of human readers. Conclusions Within the constraints of a simulated scenario design, LLMs, particularly ChatGPT-5 Thinking, provided clinically appropriate guidance that exceeded novice performance and approached that of expert clinicians. These findings support the potential role of LLMs as adjunct decision-support tools in time-sensitive obstetric and gynecologic cases. Clinical Relevance LLMs may assist clinicians and trainees in managing reproductive emergencies by offering rapid, structured, guideline-aligned recommendations. Further evaluation in real clinical settings is warranted.

Bookmark

Cite This Study

Okur et al. (Fri,) studied this question.

synapsesocial.com/papers/699405254e9c9e835dfd5f37 https://doi.org/https://doi.org/10.2460/javma.25.09.0615

Bookmark