Abstract Background and aims Accurate assessment of stroke severity and imaging findings is essential for patient management, yet documentation is often inconsistent. Large language models (LLMs) may enable efficient standardized extraction of clinical metrics. This study evaluated the ability of LLMs to reproduce neurologists’ scoring of the NIHSS, ASPECTS, and (Oxfordshire-Community-Stroke-Project (OCSP) classifications. Methods Neurological examination records and non-contrast CT scans were processed using five LLMs (GPT-4oTM, Gemini-2. 5TM, DeepSeekV2TM, Claude-4. 0TM, and PerplexityTM) to extract NIHSS, ASPECTS and OCSP scores. Agreement with neurologist ratings was assessed using bias/mean error, absolute deviation, intraclass correlation coefficient (ICC), and weighted Cohen’s kappa. Results We included 487 patients (57. 5% women; mean age 76. 3±12. 8 years). Mean NIHSS was 10. 9; 32. 2% had mild (5) and 23. 6% severe stroke (≥16). Excellent correlation was observed, with tendency toward NIHSS underestimation, ranging from −0. 09 (GPTTM) to −1. 78 (ClaudeTM), except for DeepSeekTM (+0. 83). Absolute deviation was lowest for Gemini™ (1. 08). ICC for NIHSS was excellent: GeminiTM (0. 964), GPTTM (0. 938). Kappa for NIHSS categorization (mild/moderate/severe) was 0. 827 with GeminiTM, demonstrating almost perfect agreement. For ASPECTS (mean 8. 69±2. 2), ICCs were uniformly excellent across models (0. 929-0. 968). Major ASPECTS misclassification (6-10 vs 0-5) was rare (1. 2% with GPT-4oTM and PerplexityTM) Agreement for OCSP classification was moderate, with the highest concordance for GPTTM (86. 3%). LLM-based scoring reduced assessment time by up to 16. 4 seconds per patient. Conclusions LLMs accurately reproduce neurologist-derived NIHSS and ASPECTS scores with minimal clinically relevant deviation, supporting their potential for scalable, automated extraction of stroke data from unstructured clinical records. Conflict of interest Rui Lopes: nothing to disclose
Building similarity graph...
Analyzing shared references across papers
Loading...
R Domingos Da Costa Lopes
Universidade Federal do Rio de Janeiro
Maria Carlos Pereira
Administração Regional de Saúde de Lisboa e Vale do Tejo
Carolina Gonçalves
Administração Regional de Saúde de Lisboa e Vale do Tejo
European Stroke Journal
Hospital de Santo António
Administração Regional de Saúde de Lisboa e Vale do Tejo
Building similarity graph...
Analyzing shared references across papers
Loading...
Lopes et al. (Fri,) studied this question.
synapsesocial.com/papers/69fd7e00bfa21ec5bbf06320 — DOI: https://doi.org/10.1093/esj/aakag023.719
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: