Abstract This study evaluates the quality of artificial intelligence (AI) clinical note summarization by analyzing physician qualitative feedback on a large language model (LLM) chart review tool integrated into the electronic health record (EHR). Physicians provided free-text feedback on AI-generated chart summaries, which physician informaticists analyzed using MAXQDA. Feedback from 10 physicians was collected on 147 AI-generated summaries. Positive feedback was common ( n = 71), but users identified omissions ( n = 46), confusing content ( n = 20), token limitations ( n = 27), hallucinations ( n = 5), and bias ( n = 1). Cohen’s Kappa was 0.64, indicating substantial reviewer agreement. Physician feedback on the tool revealed overall positive impressions, though omissions raised concerns about summary completeness. AI-assisted chart review technology is not infallible, but physicians found this tool acceptable for use in clinical workflows.
Kahl et al. (Wed,) studied this question.