October 2, 2025

Quality assessment of artificial intelligence‐generated versus human‐written hospital summaries evaluating detail, usefulness, and continuity of care

Key Points

AI-generated summaries outperformed human-written ones in quality and comprehensiveness, suggesting higher potential for standardization.
Normalization of scores showed statistical significance in LLM use, with a notable median difference of 0.85 in comprehensiveness.
Readability assessments indicated human summaries used simpler language, yet LLM summaries excelled in utility and detail.
The findings support the clinical integration of AI tools for improved hospital documentation processes.

Abstract

Abstract Background Hospital discharge summaries are critical for ensuring continuity of care, but their quality often varies. Large language models (LLMs) have the potential to standardize and enhance the efficiency of this documentation process. Objectives To evaluate the quality of hospital discharge summaries created by an LLM‐based hospital course drafting tool created by Epic Systems compared with human‐written summaries. Methods Retrospective study at a single tertiary‐care institution in 2024. The cohort included 100 adult hospitalizations lasting >72 hours across medical and surgical dismissing services. No interventions were performed. Summaries (LLM‐generated vs human‐written) were independently reviewed using a standardized rubric covering nine domains (e.g., comprehensiveness, clarity, relevance). Scores were normalized and compared. Readability was assessed using Flesch Reading Ease. Results LLM‐generated summaries outperformed human‐written summaries across all criteria ( p < .05), with the greatest difference observed in comprehensiveness (LLM median 0.62 vs. human −0.23). Human‐written summaries from surgical services scored lower than those from medical services, but LLM performance was consistent across both. Human summaries had higher Flesch Reading Ease scores (33.11 vs. 26.2; p < .05), reflecting simpler language. Conclusions LLM‐generated summaries demonstrated superior quality, consistency, and clinical utility compared with human‐written summaries, highlighting their potential to improve documentation efficiency and standardization.

Bookmark

Quality assessment of artificial intelligence‐generated versus human‐written hospital summaries evaluating detail, usefulness, and continuity of care

Key Points

Abstract

Cite This Study

Also Consider

Also Consider