Abstract Background: Cancers of Unknown Primary (CUP) make up approximately 2-5% of malignancies with a poor prognosis due to empirical therapy. Existing genomic classifiers only have approximately 60-75% accuracy. The AACR Project GENIE consortium and database provide large-scale and real-world genomic data, but there are currently no integrated transcriptomic features with AACR GENIE data. We hypothesized that developing a validated multi-omic framework on TCGA data could create an approach for integrating transcriptomic data into GENIE, with the expected outcome of improving the precision of the tumor-of-origin classification. Method: We examined a cohort of 1,556 solid tumors from the TCGA PanCancer Atlas, via cBioPortal, across four cancer types: COAD (n=76 test), HNSC (n=103), LUAD (n=102), and READ (n=31). Two XGBoost classifiers were trained (80/20 held-out split) with SMOTE: (1) Genomic-Only baseline classifier (Tumor Mutation Burden, MSI Score, Aneuploidy Score) and (2) Multi-Omic (same genomic features plus 20,506 RNA-Seq genes). Performance was evaluated on a held-out test set (n=312). Results: The Multi-Omic classifier achieved 93.0% accuracy compared to 52.2% for the Genomic-Only classifier (40.7% improvement). Per-class F1 scores were: COAD=0.86, HNSC=1.00, LUAD=1.00, and READ=0.58. The baseline classifier demonstrated near random performance (COAD F1=0.50 and READ F1=0.33). A feature importance analysis also confirmed known lineage markers as the top predictive features, KRT5 (9.2% importance, HNSC squamous marker), SFTPB (6.1% importance, LUAD lung surfactant), GPA33 (2.9% importance, COAD/READ intestinal marker), CDX1 (2.7% importance, intestinal transcription factor), HOXB13, NAPSA, and EVX2, therefore demonstrating meaningful biological pattern recognition. Conclusions: Multi-omic integration significantly increases tumor-of-origin classification compared to genomic methods. Its reliance on established, tissue-specific biomarkers provides biological validity critical for diagnostics. READ performance was lower (F1=0.58) due to the limited number of samples and the similarity to COAD. Overall, the framework showed good discrimination across other cancer types. This TCGA proof-of-concept establishes a validated pipeline for expanding to 15-20 cancer types using the AACR Project GENIE, helping advance clinical diagnosis of CUP and precision oncology. Citation Format: Pranav Gadde, Naga Mudda, Pratayanch Sav, Krithik Senthilkumar, Aadarsh Sivaraman, Krithik Mudda. Integrating transcriptomic data to improve multi-omic tumor-of-origin classification abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 13.
Gadde et al. (Fri,) studied this question.