Abstract Background Large language models (LLMs) show potential to support antimicrobial prescribing but require simulation‐based, institution‐specific safety evaluation prior to any consideration of clinical use. In Australia, antimicrobial prescribing represents a high‐risk domain for digital decision‐support systems due to patient safety and antimicrobial resistance implications. Aim To characterise prescribing accuracy, error phenotypes and antimicrobial stewardship risk associated with a LLM that was provided with publicly available surgical prophylaxis guidelines during inference (without fine‐tuning or model modification) across 20 simulated surgical scenarios. Methods Twenty simulated surgical scenarios were tested using a LLM that was prompt‐conditioned with publicly available guideline text during inference, without any fine‐tuning or modification of model weights. For each case, the model generated recommendations for agent, dose, timing, re‐dosing and guideline citation. Outputs were independently assessed by two local clinicians familiar with the guideline, with accuracy scored across five domains and harm classified using a modified National Coordinating Council for Medical Error Reporting and Prevention (NCC MERP) Index. Results Clinically significant antimicrobial prescribing risk was identified in 10% of simulated scenarios (2/20), recognising wide confidence intervals due to the small sample size. These included omission of required anaerobic coverage and failure to redose prophylaxis in prolonged procedures. Overall guideline concordance was 4/5, with perfect dose accuracy but lower performance for timing (70%) and guideline citation (45%). Conclusions This study demonstrates the feasibility of constructing institutionally governed, guideline‐based AI systems while identifying stewardship‐relevant safety risks that currently preclude clinical use without further validation.
Andrews et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: