What type of study is this?

This is a Experimental Study study.

September 30, 2025

Clinical AI Scribes in primary care: accuracy, error severity and implications for clinical practice

Key Points

Clinical AI scribes generated summaries with 83.8% of errors being omissions, indicating performance shortcomings.
Impact Score analysis showed that some errors, though less frequent, had higher clinical severity implications.
Evaluation of documentation using the PDQI-10 revealed weaknesses in succinctness and organization of CAIS summaries.
Vigilance is essential for clinicians, who must check for omitted details and possible inaccuracies in AI-generated summaries.

Abstract

Objectives To investigate the performance of commercially available Clinical Artificial Intelligence Scribes (CAISs), assessing their accuracy, potential clinical impact of errors, and documentation quality, given growing concerns around errors and safety. Methods and analysis Seven CAIS products were investigated, using eight standardised clinical consultation scenarios recorded as audio. CAIS-generated summaries were assessed against a human-validated transcript and evaluated for errors (omissions, factual inaccuracies and hallucinations). Error severity was rated by medical doctors, generating a novel severity-weighted mpact Score (linear and exponential variants), to quantify potential clinical impact. Further analysis using the Physician Documentation Quality Instrument (PDQI-10) (a validated clinical note quality score) reinforced the findings. Results Omissions dominated error counts (83.8%, p<<0.001), with CAISs varying widely in error frequency and severity, and a median of 1–6 omissions per consultation (depending on CAIS). Although less frequent, hallucinations and factual inaccuracies were more often clinically serious. No tested CAIS produced error-free summaries. The Impact Score highlighted clinical severity, notably amplifying the significance of less frequent but high-severity errors. PDQI-10 analysis indicated summaries were weakest in succinctness and organisation, but strong in consistency and clinical usefulness. Conclusions The CAISs demonstrate high levels of summarisation accuracy. However, there is great disparity between the currently available CAIS products and, while some perform well, none are perfect. Clinicians should therefore maintain vigilance, particularly checking omitted psychosocial details and medications, and scrutinising plausible-sounding insertions. Purchasers and regulators should be aware of the significant performance disparities identified, reinforcing the need for careful evaluation and selection of CAIS products.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Thomas C. Draper

Timothy M. Cox

Kathryn Lamb-Riddell

Actions

Institutions

University of the West of England

Taunton & Somerset NHS Foundation Trust

NHS England

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Clinical AI Scribes in primary care: accuracy, error severity and implications for clinical practice

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study