What question did this study set out to answer?

June 17, 2026Open Access

Explainable AI in Clinician-Facing Clinical Decision Support: A Critical Systematic Review and Evidence Map of Human-Centered Evaluations

Key Points

This review aims to synthesize evaluations of explainable AI in clinician-facing clinical decision support systems and to create an evidence map of outcomes.
Conducted database searches across PubMed, Google Scholar, and Semantic Scholar until January 2026.
Included studies that empirically evaluated AI-based decision support with explanation conditions and human-centered outcomes.
Critical appraisal and evidence mapping were performed with effect directions coded across outcome families.
Thirty-four studies met inclusion criteria, revealing explanations often increased perceived trust and acceptance but did not reliably improve decision quality.
In several studies, explanations worsened diagnostic accuracy when AI advice was incorrect or biased.
Counterfactual and retrieval-based explanations showed promise for reducing over-reliance on incorrect AI outputs, while generic displays offered limited added value.

Abstract

Explainable AI (XAI) in clinician-facing clinical decision support (CDS) is increasingly promoted to enhance transparency, yet prior evidence suggests that explanations do not consistently improve clinical decision-making and may occasionally exacerbate errors. This critical systematic review and evidence map aimed to (i) synthesize human-centered evaluations of explainable clinician-facing CDS, and (ii) construct an evidence map linking explanation types, clinical tasks, evaluation settings, and outcome directions for decision quality, reliance calibration, and usability. Database searches were conducted in PubMed, Google Scholar, and Semantic Scholar through January 2026. Studies were included if they empirically evaluated an AI-based CDS system with an explanation condition, involved clinicians or trainees performing clinical decision tasks, and reported human-centered outcomes. Thirty-four studies met inclusion criteria. Data extraction, critical appraisal, and evidence mapping were performed, with effect directions coded as positive, mixed/null, or negative across outcome families. Included studies disproportionally used vignette, reader, or simulation paradigms rather than workflow-embedded deployments. Across larger controlled experiments, explanations frequently increased perceived trust and acceptance but did not reliably improve decision quality. In several large studies, explanations worsened diagnostic accuracy when AI advice was incorrect or biased. The most promising signals for reliance calibration concentrated on counterfactual and retrieval-based explanations, which reduced over-reliance on incorrect AI outputs. In contrast, generic feature-attribution displays (e.g., SHAP) showed limited incremental benefit beyond AI advice alone. Some studies reported increased cognitive load and task time with explanations, particularly when dense or poorly integrated. Explanations in clinician-facing CDS often increase perceived trust and acceptance without reliably improving decision quality, and they can amplify harm when AI advice is incorrect or biased. Future evaluations should prioritize appropriate-reliance metrics stratified by AI correctness, incorporate objective workload and attention measures, and test explanation interfaces in workflow-realistic settings.

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper