What question did this study set out to answer?

The research aims to create a machine learning model to classify breast cancer receptor subtypes using genomic data rather than traditional immunohistochemistry methods.

April 5, 2026

Abstract 2724: A machine learning approach to classify breast cancer receptor subtype using genomic features.

Q: What does this research mean for the field?

A machine learning classifier utilizing whole exome sequencing genomic features can accurately predict the four major breast cancer receptor subtypes with 80.3% overall agreement. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

Key Result

A Random Forest machine learning classifier using whole exome sequencing genomic features predicted four major breast cancer receptor subtypes with 80.3% overall agreement.

Key Points

The research aims to create a machine learning model to classify breast cancer receptor subtypes using genomic data rather than traditional immunohistochemistry methods.
Analyzed genomic features using data from 19,559 breast cancer patients.
Developed a feature set from 19,820 genes based on whole exome sequencing.
Trained a Random Forest classifier using a stratified train-test split of 75/25.
Utilized treatment codes to determine hormone receptor and HER2 subtype.
Achieved 80.3% overall agreement with HR/HER2 status from medication claims data.
For HR+/HER2-, precision was 0.935, recall 0.911, F1 score 0.923.
For HR-/HER2+, precision was 0.714, recall 0.753, F1 score 0.783.
For HR+/HER2+, precision was 0.748, recall 0.734, F1 score 0.741.
For TNBC subtype, precision was 0.730, recall 0.816, F1 score 0.770.

Structured PICO

Can a machine learning classifier using genomic features accurately predict breast cancer receptor subtypes without relying on immunohistochemistry or gene expression data?

Population

19,559 patients with primary breast cancer, identified using a proprietary real-world database linked to a clinical claims database

Intervention

Random Forest machine learning classifier using somatic mutations across 19,820 genes from whole exome sequencing (WES) data

Comparator

Hormone receptor (HR) and HER2 subtype inferred through medication claims data

Outcome

Overall agreement with HR/HER2 status

A machine learning classifier using genomic features from whole exome sequencing can accurately predict breast cancer receptor subtypes, offering a potential alternative when tissue samples are inadequate for standard immunohistochemistry.

Main Result

Absolute Event Rate: 0% vs 0%

Abstract

Abstract Risk stratification, treatment course, and prognosis for patients with breast cancer presently rely upon the accurate determination of receptor subtype, ascertained through immunohistochemistry (IHC) for estrogen receptor (ER) and progesterone receptor (PR), and evaluation of HER2 expression (IHC and/or gene amplification via in situ hybridization). While IHC-based subtyping assays are informative, they require high-quality tissue samples and the technical assays can be susceptible to fixation artifacts, variability in antibody staining performance, semi-quantitative and subjective result calling. In cases of diminished sample quality, IHC-based subtype assessment may not agree with gene expression-based classification, and alternative approaches may be needed. This study aimed to develop a machine learning classifier able to predict breast cancer receptor subtypes using genomic features, without relying on immunohistochemistry or gene expression data. This study included 19, 559 patients with primary breast cancer, identified using Natera’s proprietary real-world database, linked to a clinical claims database. Hormone receptor (HR) and HER2 subtype was determined from patient treatment codes. We developed a biologically-informed feature set by combining somatic mutations across 19, 820 genes, using whole exome sequencing (WES) data from the SignateraTM testing workflow. Each mutation was assigned a composite mutationₛcore (range 1-12) based on variant class (SNV, insertion, deletion), superclass (SNP/INDEL), predicted impact (VEP annotation impact: MODIFIER to HIGH), and functional consequence (such as frameshift, stop-gain, missense, synonymous). A Random Forest classifier was trained with a stratified 75/25 train-test splitting and hyperparameter optimization. The model was trained on features from 14, 669 patients in the training cohort. In a test cohort of 4, 890 patients, the model achieved 80. 3% overall agreement with HR/HER2 status as inferred through medication claims data, with balanced performance across four major subtypes. Per-subtype metrics were: for HR+/HER2-, the model showed a precision of 0. 935, recall 0. 911, and F1 score of 0. 923; for HR-/HER2+, precision was 0. 714, recall was 0. 753, and F1 score was 0. 783; for HR+/HER2+, precision was 0. 748, recall was 0. 734, and F1 score was 0. 741; lastly, for the TNBC subtype, precision was 0. 730, recall was 0. 816, and F1 score was 0. 770. Overall the genomic classifier accurately classifies breast cancer into one of the four major receptor subtypes. After definitive validation against clinically-reported HR/HER2 status, this classifier could be used to guide analyses of de-identified genomic datasets that lack complete clinical annotation. Citation Format: Sandro Satta, Philip Miller, Samuel Rivero-Hinojosa, Ekaterina Kalashnikova, Angel Rodriguez, Minetta C. Liu,. A machine learning approach to classify breast cancer receptor subtype using genomic features abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts) ; 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86 (7 Suppl): Abstract nr 2724.

Bookmark

Cite This Study

Satta et al. (Fri,) reported a other. A Random Forest machine learning classifier using whole exome sequencing genomic features predicted four major breast cancer receptor subtypes with 80.3% overall agreement.

synapsesocial.com/papers/69d1fd8ea79560c99a0a3aa4 https://doi.org/https://doi.org/10.1158/1538-7445.am2026-2724

Bookmark