What question did this study set out to answer?

To assess the performance of a large language model in adjudicating surgical site infections.

April 15, 2026Open Access

Use of a large language model integrated within the electronic medical record for the evaluation of surgical site infections – Northern California, 2025

Key Points

To assess the performance of a large language model in adjudicating surgical site infections.
Evaluated gpt-4o-mini in the context of electronic medical records.
Measured sensitivity and specificity for surgical site infection detection.
Compared workload reduction before and after implementation.
Achieved 100% sensitivity in detecting surgical site infections.
Recorded 69.4% specificity, indicating a high rate of false positives.
Reduced manual screening workload by 66% in the evaluation process.

Abstract

Our study evaluated a large language model (gpt-4o-mini) for surgical site infection (SSI) adjudication, achieving 100% sensitivity but 69.4% specificity. While reducing the manual screening workload by 66%, the agent generated many false positives, underscoring the need for refined models to improve specificity without compromising accuracy.

Bookmark

View Full Paper

Bookmark

View Full Paper

Use of a large language model integrated within the electronic medical record for the evaluation of surgical site infections – Northern California, 2025

Key Points

Abstract

Cite This Study