Introduction: Cancer of unknown primary (CUP) remains a major diagnostic hurdle, compromising therapies that depend on accurately identifying tissue of origin. We present TCUP, an ensemble learning framework that combines Contrastive Autoencoders (CAE) and Siamese Neural Networks (SNN) with base classifiers and a meta-learning layer to classify and interpret CUP, adding biological insight through Monte-Carlo ablations. Methods: Gene-expression data from TCGA (tumour), GTEx (normal), and the Genome Sciences Centre (metastatic) were imputed, log-transformed, and SMOTE-balanced. A SNN and CAE learned pairwise and reconstruction embeddings. Multiple base classifiers (e.g., SVM, Random Forest) generated meta-features, which a meta-learner combined for final prediction. Monte-Carlo ablation iterations were performed to assess gene-level importance. Results: TCUP achieved 98.3 % accuracy (F1 = 98.3) across all tissues. In metastatic BRCA, COAD, and PAAD it reached 86.7 % accuracy. Ablation highlighted 79 key contributors, including established tumour suppressors NKX6-1 and SOX30 and the less-studied SYTL1. PCA confirmed clearer separation in embedded space. Conclusion: TCUP delivers high tissue-of-origin accuracy and CUP assignment while providing interpretable gene importance that clarifies tissue differences and metastatic drivers. By integrating advanced embeddings with systematic ablation TCUP supplies an accessible framework to advance CUP research and, ultimately, improve clinical outcomes. TCUP is freely available at https://fohs.bgu.ac.il/rubinlab/TCUP/
Building similarity graph...
Analyzing shared references across papers
Loading...
Ohad Landau
Eitan Rubin
Ben-Gurion University of the Negev
Building similarity graph...
Analyzing shared references across papers
Loading...
Landau et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68a366930a429f797332be9f — DOI: https://doi.org/10.1101/2025.08.08.669066