A multi-feature genome-wide cfDNA machine learning classifier achieved 80% top 1 accuracy and 90% top 2 accuracy for cancer tissue-of-origin identification in an independent validation cohort.
Observational (n=3,073)
Does a machine learning classifier using multi-feature genome-wide cfDNA profiling accurately identify the tissue of origin in cancer patients?
1,814 patients across 17 cancer types in the training cohort, 1,221 patients in the independent validation cohort, plus additional cohorts of cancers of unknown primary (n=15) and multiple primary cancers (n=23).
Machine learning classifier using whole genome cfDNA profiles integrating 11 distinct fragmentomic, genomic, epigenomic, and microbiomic features.
Top 1 and top 2 accuracy of cancer tissue of origin (TOO) prediction.surrogate
A cfDNA-based machine learning classifier integrating multiple genomic and epigenomic features accurately identifies cancer tissue of origin, demonstrating robust performance even in low-ctDNA and diagnostically challenging cases.
Abstract Background: Determination of the tissue of origin (TOO) of cancer is essential for appropriate clinical management and treatment selection. Liquid biopsy using circulating cell-free DNA (cfDNA) offers a non-invasive approach for cancer detection and TOO prediction. Circulating tumor DNA (ctDNA), a tumor-derived fraction of cfDNA, carries genomic and epigenomic signatures reflective of its origin. Recent advances in machine learning have enable the development of models to predict TOO from cfDNA profiles. However, current methods show variable performance, particularly in samples with low ctDNA fractions (ctDNA 3%), and accuracy remains inconsistent across different cancer types. Methods: We developed a TOO classifier using whole genome cfDNA profiles from 1814 patients across 17 cancer types. Multiple distinct cfDNA features were extracted to reveal diverse cancer-associated alterations, including copy number variations, repeat elements, fragment end motifs associated with DNA methylation, fragment size distribution and coverage, microsatellite instability, mutational signatures, nucleosome occupancy, tissue-specific fragmentation patterns, and the presence of cancer-associated viral DNA. Model performance was evaluated in an independent external cohort of 1221 patients. Additional tests were conducted in cohorts of patients with cancers of unknown primary (CUP) and multiple primary cancers (MPC). Results: Our cancer classifier achieved an overall top 1 accuracy of 78% and top 2 accuracy of 89% in the training cohort, with consistently high accuracy across all cancer types. In the independent validation cohort, the model maintained robust performance, with top 1 and top 2 accuracies of 80% and 90%, respectively. Sensitivity increased with the advancing cancer stage, improving from 66.8% in stage I to 86.2% in stage IV. Among 612 low-ctDNA samples, 435 cases (71.1%) were correctly classified. The classifier also showed strong potential in CUP, with 11 of 15 cases (73.3%) aligning with the clinically suspected primary site. Furthermore, among 20 MPC cases with two primary sites, both were correctly identified within the top three predictions in 9 cases. In 3 MPC patients with three primary sites, two of the three sites were accurately captured among the top three predictions. Conclusion: Our cfDNA-based machine learning classifier provides a robust, non-invasive approach for accurate cancer tissue-of-origin identification. Integrating 11 distinct cfDNA-derived fragmentomic, genomic, epigenomic, and microbiomic features, the model achieved high accuracy across multiple cancer types and maintained strong performance in low-ctDNA samples. Its promising results in CUP and MPC further highlight its potential clinical utility in resolving diagnostically challenging cases and guiding precision oncology applications. Citation Format: Yunjian Zhang, Liang Liu, Hua Bao, Haimeng Tang, Ke Xu, Hao Zhang, Song Wang, Shuang Chang, Dongqin Zhu, Zongyao Huang, Zheng Wang, Liu Yang, Bingzhong Zhang, Ji Tao, Wenhua Liang, Jierong Chen, Shanshan Yang, Xue Wu, Yang Shao, Wenquan Wang, Dongyuan Zhu. Machine learning classifier for cancer type identification via multi-feature genome-wide cfDNA profiling abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 1123.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yunjian Zhang
Liang Liu
Hua Bao
Cancer Research
Fudan University
Sun Yat-sen University
Guangzhou Medical University
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang et al. (Fri,) conducted a observational in Cancer (n=3,073). Machine learning classifier via multi-feature genome-wide cfDNA profiling was evaluated on Top 1 accuracy for cancer tissue of origin identification in the independent validation cohort. A multi-feature genome-wide cfDNA machine learning classifier achieved 80% top 1 accuracy and 90% top 2 accuracy for cancer tissue-of-origin identification in an independent validation cohort.
synapsesocial.com/papers/69d1fc8ea79560c99a0a2221 — DOI: https://doi.org/10.1158/1538-7445.am2026-1123