Importance Emergency departments (EDs) face significant documentation burdens due to reliance on unstructured clinical narratives, hindering efficiency, particularly in pediatric care. Large language models (LLMs) offer a potential solution by automating data extraction to improve clinical workflows. Objective To determine whether an LLM can accurately and efficiently extract structured clinical data from free-text pediatric ED records in a non-English setting. Design Diagnostic accuracy study using retrospective data from 2007 to 2023. Manual clinician classification served as the gold standard to assess model performance. Setting Single-center study conducted at the pediatric ED of Padova University Hospital, a tertiary care referral center in Italy. Participants A convenience sample of 697 anonymized ED records from children with complex medical conditions. Exposure Automated data extraction using OpenAI's GPT-5.2 model via structured prompts processed in Python. All texts were in Italian and translated to English in the workflow. Main Outcomes and Measures Primary outcomes included accuracy, AUC, sensitivity, and specificity of the LLM in extracting triage color codes, ED outcomes, reasons for ED visit, and performed procedures. Efficiency gains were also measured by comparing manual and automated extraction times. Results Among 697 records analyzed, the primary model (GPT-5.2) achieved high accuracy in classifying triage color (0.99) and ED outcome (0.984). Accuracy for laboratory tests was 0.96, oxygen therapy 0.95, and nasogastric tube placement 0.987. Results were consistent across all seven models (mean Fleiss’ kappa = 0.922). Processing time was reduced from ∼5 min to 6 s per record, with a total cost of € 23.42. Conclusions In this study of pediatric ED encounters in a non-English setting, LLMs reliably extracted structured clinical data and substantially reduced documentation processing time. These findings supported their potential to streamline workflows, particularly in resource-constrained environments. Further research was warranted to improve classification of complex or ambiguous information.
Brigiari et al. (Sun,) studied this question.