What question did this study set out to answer?

The study aims to develop a framework for classifying germline BRCA mutation status from unstructured EHR notes using language models.

February 22, 2026

Abstract PS3-04-05: Classifying Germline BRCA Status from Unstructured Electronic Health Record Notes: A Systematic Prompt Engineering Approach

Key Result

Optimized zero-shot LLM framework classified germline BRCA status from unstructured EHR notes with 86.1% accuracy and 0.88 F1 score in development, 82.4% accuracy in validation.

Key Points

The study aims to develop a framework for classifying germline BRCA mutation status from unstructured EHR notes using language models.
Utilized two cohorts from metastatic breast cancer patients' unstructured notes.
Optimized a large language model through iterative prompt engineering.
Evaluated performance based on classification accuracy and F1-score.
Achieved an accuracy of 86.1% and a weighted F1-score of 0.88 in the development cohort.
Validation cohort showed an accuracy of 82.4% and a weighted F1-score of 0.81.
Prompt engineering improved accuracy by 7.5%, enhancing classification performance significantly.

Structured PICO

Can a zero-shot LLM framework accurately classify germline BRCA mutation status from unstructured EHR notes in metastatic breast cancer patients?

Population

949 metastatic breast cancer patients whose genetic data were confined to unstructured notes at a large single academic cancer center (development cohort n=439, validation cohort n=510).

Intervention

Systematic, zero-shot large language model (LLM) framework with multi-phase optimization of inputs and iterative prompt engineering.

Outcome

Classification accuracy and F1-score for germline BRCA mutation status on an 11-class schema.

A systematic zero-shot LLM framework with iterative prompt engineering can accurately extract complex genetic data like germline BRCA status from unstructured clinical notes.

Main Result

Absolute Event Rate: 0% vs 0%

Abstract

Abstract Background: Vast critical precision oncology data, including results for actionable findings like germline BRCA (gBRCA) mutations, remain hidden in unstructured electronic health record (EHR) notes. We sought to develop and evaluate a systematic, zero-shot large language model (LLM) framework to accurately classify gBRCA mutation status in patients using real-world EHR clinical note narratives. Methods: In this retrospective study at a large single academic cancer center, we utilized two distinct cohorts randomly sampled from a population of metastatic breast cancer patients whose genetic data were confined to unstructured notes. The development cohort (n=439) was used for model optimization, and an independent validation cohort (n=510) was used for objective performance evaluation. Our framework involved selecting an optimal base model by evaluating three LLMs with standardized prompts to ensure an unbiased comparison. The selected model that demonstrated the best performance then underwent further optimization of its input data and iterative prompt engineering to enhance performance on a granular 11-class schema. Key performance metrics included classification accuracy and F1-score (the harmonic mean of precision and recall). Results: Our final optimized framework achieved an accuracy of 86.1% and a weighted F1-score of 0.88 on the 11-class task in our development cohort. Crucially, the model demonstrated robust performance and generalization on our validation cohort with an accuracy of 82.4% and weighted F1 score of 0.81. A qualitative error analysis of the test cohort set clarified that many discrepancies were attributable to heterogenous clinical annotation criteria between the two cohorts. Significant performance gains on the development cohort set were attributed to prompt engineering, which improved accuracy by 7.5%: from 78.6% to 86.1%. Conclusion: Zero-shot LLMs, when applied within a systematic framework characterized by its multi-phase optimization of inputs and iterative engineering of prompts that enforce step-by-step reasoning, can accurately and reliably extract critical genetic data from unstructured EHR notes. This approach provides a reproducible and scalable blueprint for developing trustworthy clinical artificial intelligence methodology to accelerate research and precision oncology. Citation Format: W. Zhu, A. Gutierrez, L. Hsu, D. Tripathy, J. Litton, B. Arun, C. Barcenas,, A. Singareeka Raghavendra. Classifying Germline BRCA Status from Unstructured Electronic Health Record Notes: A Systematic Prompt Engineering Approach abstract. In: Proceedings of the San Antonio Breast Cancer Symposium 2025; 2025 Dec 9-12; San Antonio, TX. Philadelphia (PA): AACR; Clin Cancer Res 2026;32(4 Suppl):Abstract nr PS3-04-05.

Bookmark