Abstract Background: Cancers of Unknown Primary (CUP) make up approximately 2-5% of malignancies with a poor prognosis due to empirical therapy. Existing genomic classifiers only have approximately 60-75% accuracy. The AACR Project GENIE consortium and database provide large-scale and real-world genomic data, but there are currently no integrated transcriptomic features with AACR GENIE data. We hypothesized that developing a validated multi-omic framework on TCGA data could create an approach for integrating transcriptomic data into GENIE, with the expected outcome of improving the precision of the tumor-of-origin classification. Method: We examined a cohort of 1,556 solid tumors from the TCGA PanCancer Atlas, via cBioPortal, across four cancer types: COAD (n=76 test), HNSC (n=103), LUAD (n=102), and READ (n=31). Two XGBoost classifiers were trained (80/20 held-out split) with SMOTE: (1) Genomic-Only baseline classifier (Tumor Mutation Burden, MSI Score, Aneuploidy Score) and (2) Multi-Omic (same genomic features plus 20,506 RNA-Seq genes). Performance was evaluated on a held-out test set (n=312). Results: The Multi-Omic classifier achieved 93.0% accuracy compared to 52.2% for the Genomic-Only classifier (40.7% improvement). Per-class F1 scores were: COAD=0.86, HNSC=1.00, LUAD=1.00, and READ=0.58. The baseline classifier demonstrated near random performance (COAD F1=0.50 and READ F1=0.33). A feature importance analysis also confirmed known lineage markers as the top predictive features, KRT5 (9.2% importance, HNSC squamous marker), SFTPB (6.1% importance, LUAD lung surfactant), GPA33 (2.9% importance, COAD/READ intestinal marker), CDX1 (2.7% importance, intestinal transcription factor), HOXB13, NAPSA, and EVX2, therefore demonstrating meaningful biological pattern recognition. Conclusions: Multi-omic integration significantly increases tumor-of-origin classification compared to genomic methods. Its reliance on established, tissue-specific biomarkers provides biological validity critical for diagnostics. READ performance was lower (F1=0.58) due to the limited number of samples and the similarity to COAD. Overall, the framework showed good discrimination across other cancer types. This TCGA proof-of-concept establishes a validated pipeline for expanding to 15-20 cancer types using the AACR Project GENIE, helping advance clinical diagnosis of CUP and precision oncology. Citation Format: Pranav Gadde, Naga Mudda, Pratayanch Sav, Krithik Senthilkumar, Aadarsh Sivaraman, Krithik Mudda. Integrating transcriptomic data to improve multi-omic tumor-of-origin classification abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 13.
Building similarity graph...
Analyzing shared references across papers
Loading...
Pranav Gadde
N. Mudda
Pratayanch Sav
Cancer Research
Illinois Mathematics and Science Academy
Building similarity graph...
Analyzing shared references across papers
Loading...
Gadde et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69d1fcfda79560c99a0a2c4c — DOI: https://doi.org/10.1158/1538-7445.am2026-13