Large language models (LLMs) show promise for biomedical text analysis, but their ability to localize the epileptogenic zone (EZ) from pre-surgical data is underexplored. We evaluated three leading LLMs on predicting the surgically validated EZ using multi-source unstructured clinical text from 154 patients (two cohorts) with drug-resistant epilepsy who achieved postoperative seizure freedom (Engel Class I). Models performed EZ laterality classification, probabilistic lobar localization, and SEEG stratification based on five presurgical text sources, with performance benchmarked against resected lobes. All models achieved 98.3% accuracy in Cohort 1 and 100% in Cohort 2 for laterality. In lobar localization, GPT-4.1 and Claude 3.7 Sonnet achieved a median score of 70, significantly higher than Deepseek-R1’s 60 (p 0.05). A modality ablation analysis revealed that GPT-4.1’s performance was most significantly impaired by removing medical records (p<0.001) and MRI reports (p = 0.014). Predictions exhibited excellent test-retest reliability (ICC = 0.951), and higher SEEG stratification scores were assigned to patients who underwent invasive monitoring (p < 0.001). In conclusion, LLMs can accurately infer the surgically validated EZ from raw preoperative text and may serve as valuable clinical decision-support tools.
Sun et al. (Mon,) studied this question.