What question did this study set out to answer?

The central aim is to assess if large language models can effectively extract prognostic variables for neonatal intraventricular hemorrhage and its outcomes.

March 25, 2026

Harnessing Large Language Models in Neonatal IVH: Exploring RAG Methodology for Prognostic Variable Discovery

Puntos clave

The central aim is to assess if large language models can effectively extract prognostic variables for neonatal intraventricular hemorrhage and its outcomes.
Conducted a systematic literature review using RAG methodology
Utilized GPT-4 and Claude Sonnet to identify relevant studies
Extracted data following TRIPOD AI guidelines
Performed semi-automated extraction with manual validation
Identified 39 studies, with 28 meeting validation criteria
Extracted 14 distinct prognostic predictors across four outcomes: mortality, progression, complications, and resolution
Universal high-impact predictors included gestational age, birth weight, and APGAR scores
High-risk neonates showed >70% progression risk and >50% mortality, while low-risk neonates showed favorable outcomes

Resumen

Objective: To evaluate whether large language models (LLMs) can autonomously synthesize existing literature and accurately extract prognostic variables for neonatal intraventricular hemorrhage (IVH) and its outcomes while assessing their capability for clinical feature ranking and risk stratification. Study Design: This pilot study employed a systematic literature review combined with retrieval augmented generation (RAG) methodology. GPT 4 (OpenAI) and Claude Sonnet (4.0, Anthropic) were prompted to identify peer-reviewed studies utilizing machine learning and deep learning to predict IVH outcomes in preterm neonates. Data extraction was prompted to follow TRIPOD AI guidelines, capturing study design, population characteristics, predictor variables, and outcome measures. Semi-automated RAG extraction was performed with manual validation to mitigate hallucination risk. Results: LLMs initially identified 39 studies, with 28 meeting some or all the validation criteria after excluding references that were hallucinated. From these, 14 distinct prognostic predictors were extracted across four outcome domains: mortality, progression, complications, and resolution. Universal high-impact predictors included gestational age (13 mentions; 41%), birth weight (8 mentions, 25%), and APGAR scores (11 mentions, 34%). Variables were categorized into 3 clinical tiers based on frequency, outcome breadth, and modifiability. A preliminary risk stratification model demonstrated high-risk neonates (70%, and mortality >50%, while low-risk neonates (>32 weeks, >1500g, APGAR>5) showed favorable trajectories. Conclusions: This study demonstrates that LLMs can synthesize medical literature and extract clinically relevant prognostic variables for neonatal IVH outcomes. However, LLM outputs were susceptible to hallucinations and incomplete data synthesis, underscoring the need for rigorous clinical oversight and human validation to ensure reliability. The identified universal predictors provide a foundation for developing AI-assisted clinical decision support tools. Notable research gaps include the complete absence of resolution prediction studies and limited investigation of complication predictors, highlighting opportunities for future investigation in precision neonatology.

Me gusta

Guardar

Me gusta

Guardar

Harnessing Large Language Models in Neonatal IVH: Exploring RAG Methodology for Prognostic Variable Discovery

Puntos clave

Resumen

Cite This Study