Key points are not available for this paper at this time.
Explainable Artificial Intelligence (XAI) is proposed as essential for high-risk applications like healthcare, where it aims to enhance user trust. However, studies often rely on automated metrics rather than user evaluation. We adapt a prototype-based XAI model for image-based gestational age (GA) estimation and evaluate its impact on trust, reliance, and performance, including a novel measure of appropriate reliance. Ten sonographers completed a 3-stage reader study assessing the XAI model's impact on GA estimates. Model predictions reduced clinician mean absolute error (MAE) from 23.5 to 15.7 days, and explanations had a further non-significant reduction to 14.3 days. However, the impact of explanations varied across participants, with some performing worse with explanations than without. Additionally, although explanations increased participant confidence, they had no significant effect on trust or reliance on the model. These counterintuitive results highlight potential pitfalls in deploying XAI, emphasising the need for human studies to capture clinician variability.
Nicolson et al. (Fri,) studied this question.