Optimized zero-shot LLM framework classified germline BRCA status from unstructured EHR notes with 86.1% accuracy and 0.88 F1 score in development, 82.4% accuracy in validation.
Can a zero-shot LLM framework accurately classify germline BRCA mutation status from unstructured EHR notes in metastatic breast cancer patients?
A systematic zero-shot LLM framework with iterative prompt engineering can accurately extract complex genetic data like germline BRCA status from unstructured clinical notes.
Absolute Event Rate: 0% vs 0%
Abstract Background: Vast critical precision oncology data, including results for actionable findings like germline BRCA (gBRCA) mutations, remain hidden in unstructured electronic health record (EHR) notes. We sought to develop and evaluate a systematic, zero-shot large language model (LLM) framework to accurately classify gBRCA mutation status in patients using real-world EHR clinical note narratives. Methods: In this retrospective study at a large single academic cancer center, we utilized two distinct cohorts randomly sampled from a population of metastatic breast cancer patients whose genetic data were confined to unstructured notes. The development cohort (n=439) was used for model optimization, and an independent validation cohort (n=510) was used for objective performance evaluation. Our framework involved selecting an optimal base model by evaluating three LLMs with standardized prompts to ensure an unbiased comparison. The selected model that demonstrated the best performance then underwent further optimization of its input data and iterative prompt engineering to enhance performance on a granular 11-class schema. Key performance metrics included classification accuracy and F1-score (the harmonic mean of precision and recall). Results: Our final optimized framework achieved an accuracy of 86.1% and a weighted F1-score of 0.88 on the 11-class task in our development cohort. Crucially, the model demonstrated robust performance and generalization on our validation cohort with an accuracy of 82.4% and weighted F1 score of 0.81. A qualitative error analysis of the test cohort set clarified that many discrepancies were attributable to heterogenous clinical annotation criteria between the two cohorts. Significant performance gains on the development cohort set were attributed to prompt engineering, which improved accuracy by 7.5%: from 78.6% to 86.1%. Conclusion: Zero-shot LLMs, when applied within a systematic framework characterized by its multi-phase optimization of inputs and iterative engineering of prompts that enforce step-by-step reasoning, can accurately and reliably extract critical genetic data from unstructured EHR notes. This approach provides a reproducible and scalable blueprint for developing trustworthy clinical artificial intelligence methodology to accelerate research and precision oncology. Citation Format: W. Zhu, A. Gutierrez, L. Hsu, D. Tripathy, J. Litton, B. Arun, C. Barcenas,, A. Singareeka Raghavendra. Classifying Germline BRCA Status from Unstructured Electronic Health Record Notes: A Systematic Prompt Engineering Approach abstract. In: Proceedings of the San Antonio Breast Cancer Symposium 2025; 2025 Dec 9-12; San Antonio, TX. Philadelphia (PA): AACR; Clin Cancer Res 2026;32(4 Suppl):Abstract nr PS3-04-05.
Wang et al. (Tue,) reported a other. Optimized zero-shot LLM framework classified germline BRCA status from unstructured EHR notes with 86.1% accuracy and 0.88 F1 score in development, 82.4% accuracy in validation.