What question did this study set out to answer?

To assess the feasibility of an LLM-based chatbot for addressing parental questions in the PICU and prepare for a future RCT.

March 22, 2026Open Access

Feasibility of a Large Language Model Chatbot to Support Parental Understanding in the PICU

Key Points

To assess the feasibility of an LLM-based chatbot for addressing parental questions in the PICU and prepare for a future RCT.
Conducted a single-arm feasibility study
Engaged 14 parents in 10-minute sessions with a GPT-4o chatbot
Assessed parental engagement, satisfaction, provider perceptions, accuracy, and recruitment metrics
87.5% recruitment rate with 14 out of 16 eligible parents enrolled
Parents expressed high satisfaction (96% positive ratings and a median score of 5.0/6.0)
99.3% accuracy rate in chatbot-generated responses with minor errors
Healthcare providers rated response quality highly (median score of 5.0/6.0)

Abstract

OBJECTIVES: To evaluate the feasibility of a large language model (LLM)-based chatbot for answering parental questions in the PICU and inform design of a randomized controlled trial (RCT). DESIGN: Prospective single-arm feasibility study conducted from August 2024 to December 2024. SETTING: Quaternary PICU. SUBJECTS: Fourteen parents of children admitted to the PICU. INTERVENTIONS: Parents engaged in 10-minute sessions with a HIPAA-compliant GPT-4o- (Generative Pretrained Transformer 4o, OpenAI, San Francisco, CA) based chatbot prompted with patient-specific electronic health record (EHR) data. MEASUREMENTS AND MAIN RESULTS: Feasibility was assessed through four criteria: parental engagement and satisfaction, provider perceptions, accuracy and safety, and recruitment. Of 16 eligible parents, 14 enrolled and completed all procedures (87.5% recruitment rate). Parents asked a median of six questions (range, 3–13) with 96% positive real-time satisfaction ratings. Post-interaction surveys demonstrated high perceived value (median, 5.0/6.0 across all domains; Net Promoter Score NPS +57). Of 1225 chatbot-generated sentences evaluated, 99.3% were accurate with all eight errors classified as minor (inter-rater reliability: Gwet’s AC2, a chance-corrected inter-rater agreement coefficient, = 0.98; 95% CI, 0.97–0.99). Healthcare providers rated response quality highly (median, 5.0/6.0), although physicians expressed greater comfort with bedside use of the tool than nurses (5.0 vs. 4.0; p = 0.004). Sample size calculations using NPS as the primary endpoint suggest enrolling 135 participants would provide adequate power for a future RCT. CONCLUSIONS: An EHR-informed LLM chatbot demonstrated high parental engagement and satisfaction, positive provider perception, and high accuracy and safety, supporting progression to a RCT.

Feasibility of a Large Language Model Chatbot to Support Parental Understanding in the PICU

Key Points

Abstract

Cite This Study