What question did this study set out to answer?

March 13, 2026Open Access

Safety of a large language model-based clinical decision support system in African primary healthcare

Key Points

Evaluate the safety and efficacy of a large language model-based clinical decision support system in African primary healthcare settings.
Conducted a retrospective evaluation of records embedded in electronic medical records.
Reviewed 1,469 records from 16 primary care clinics in Kenya.
Physicians assessed the accuracy and alignment of the model's recommendations with local guidelines.
Hallucinations occurred in 3.4% of encounters, primarily involving acronym issues.
Clinical management guidance aligned with local guidelines in 99% of cases.
Harmful recommendations were found in 7.8% of encounters, with few adjustments by clinicians.

Abstract

Abstract Here we conducted a retrospective evaluation of an electronic medical record-embedded large language model clinical decision support system deployed across 16 primary care clinics in Kenya, between July and September 2024. A panel of trained physicians reviewed 1,469 records. Hallucinations were uncommon, occurring in 50 encounters (3.4%, 95% confidence interval (CI) 2.5–4.5), and most often involved misexpanded acronyms or drug names. Clinical management guidance aligned with local guidelines in almost all cases (1,455; 99%, 95% CI 98.4–99.5). Despite this, clinicians did not modify documentation in 917 encounters (62%, 95% CI 59.9–64.9). Safety assessments identified actively harmful recommendations from the large language model in 115 encounters (7.8%, 95% CI 6.5–9.3), with 67 such recommendations appearing in the final documentation. Conversely, risk present in the clinician’s initial notes was fully mitigated in 118 encounters (8.0%, 95% CI 6.7–9.5 overall; 12.1%, 95% CI 9.5–15.2 of amended cases). Overall, the tool showed strong potential to support quality improvement, but the asymmetric adoption of harmful versus beneficial outputs underscores the need for usability optimization, local guardrails and prospective trials to confirm patient-level benefit.

Bookmark

View Full Paper