What is the clinical evidence from this study?

Study design: Observational. Population: Cancer (n=3073). Intervention: Machine learning classifier via multi-feature genome-wide cfDNA profiling. Primary outcome: Top 1 accuracy for cancer tissue of origin identification in the independent validation cohort.

What question did this study set out to answer?

To develop a machine learning classifier that accurately identifies cancer tissue of origin using cfDNA.

April 5, 2026

Abstract 1123: Machine learning classifier for cancer type identification via multi-feature genome-wide cfDNA profiling

Key Result

A multi-feature genome-wide cfDNA machine learning classifier achieved 80% top 1 accuracy and 90% top 2 accuracy for cancer tissue-of-origin identification in an independent validation cohort.

Key Points

To develop a machine learning classifier that accurately identifies cancer tissue of origin using cfDNA.
Analyzed whole genome cfDNA profiles from 1814 patients across 17 cancer types.
Extracted multiple cfDNA features, including genomic and epigenomic markers.
Evaluated model performance in an independent cohort of 1221 patients and additional cohorts with cancers of unknown primary and multiple primary cancers.
Classifier achieved 78% top 1 accuracy and 89% top 2 accuracy in training cohort.
Maintained 80% top 1 and 90% top 2 accuracy in external validation cohort.
Sensitivity improved from 66.8% in stage I to 86.2% in stage IV cancer.
Successfully classified 71.1% of low-ctDNA samples and demonstrated strong performance in cancers of unknown primary.

Study Design

Type

Observational (n=3,073)

Structured PICO

Does a machine learning classifier using multi-feature genome-wide cfDNA profiling accurately identify the tissue of origin in cancer patients?

Population

1,814 patients across 17 cancer types in the training cohort, 1,221 patients in the independent validation cohort, plus additional cohorts of cancers of unknown primary (n=15) and multiple primary cancers (n=23).

Intervention

Machine learning classifier using whole genome cfDNA profiles integrating 11 distinct fragmentomic, genomic, epigenomic, and microbiomic features.

Outcome

Top 1 and top 2 accuracy of cancer tissue of origin (TOO) prediction.surrogate

A cfDNA-based machine learning classifier integrating multiple genomic and epigenomic features accurately identifies cancer tissue of origin, demonstrating robust performance even in low-ctDNA and diagnostically challenging cases.

Abstract

Abstract Background: Determination of the tissue of origin (TOO) of cancer is essential for appropriate clinical management and treatment selection. Liquid biopsy using circulating cell-free DNA (cfDNA) offers a non-invasive approach for cancer detection and TOO prediction. Circulating tumor DNA (ctDNA), a tumor-derived fraction of cfDNA, carries genomic and epigenomic signatures reflective of its origin. Recent advances in machine learning have enable the development of models to predict TOO from cfDNA profiles. However, current methods show variable performance, particularly in samples with low ctDNA fractions (ctDNA 3%), and accuracy remains inconsistent across different cancer types. Methods: We developed a TOO classifier using whole genome cfDNA profiles from 1814 patients across 17 cancer types. Multiple distinct cfDNA features were extracted to reveal diverse cancer-associated alterations, including copy number variations, repeat elements, fragment end motifs associated with DNA methylation, fragment size distribution and coverage, microsatellite instability, mutational signatures, nucleosome occupancy, tissue-specific fragmentation patterns, and the presence of cancer-associated viral DNA. Model performance was evaluated in an independent external cohort of 1221 patients. Additional tests were conducted in cohorts of patients with cancers of unknown primary (CUP) and multiple primary cancers (MPC). Results: Our cancer classifier achieved an overall top 1 accuracy of 78% and top 2 accuracy of 89% in the training cohort, with consistently high accuracy across all cancer types. In the independent validation cohort, the model maintained robust performance, with top 1 and top 2 accuracies of 80% and 90%, respectively. Sensitivity increased with the advancing cancer stage, improving from 66.8% in stage I to 86.2% in stage IV. Among 612 low-ctDNA samples, 435 cases (71.1%) were correctly classified. The classifier also showed strong potential in CUP, with 11 of 15 cases (73.3%) aligning with the clinically suspected primary site. Furthermore, among 20 MPC cases with two primary sites, both were correctly identified within the top three predictions in 9 cases. In 3 MPC patients with three primary sites, two of the three sites were accurately captured among the top three predictions. Conclusion: Our cfDNA-based machine learning classifier provides a robust, non-invasive approach for accurate cancer tissue-of-origin identification. Integrating 11 distinct cfDNA-derived fragmentomic, genomic, epigenomic, and microbiomic features, the model achieved high accuracy across multiple cancer types and maintained strong performance in low-ctDNA samples. Its promising results in CUP and MPC further highlight its potential clinical utility in resolving diagnostically challenging cases and guiding precision oncology applications. Citation Format: Yunjian Zhang, Liang Liu, Hua Bao, Haimeng Tang, Ke Xu, Hao Zhang, Song Wang, Shuang Chang, Dongqin Zhu, Zongyao Huang, Zheng Wang, Liu Yang, Bingzhong Zhang, Ji Tao, Wenhua Liang, Jierong Chen, Shanshan Yang, Xue Wu, Yang Shao, Wenquan Wang, Dongyuan Zhu. Machine learning classifier for cancer type identification via multi-feature genome-wide cfDNA profiling abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 1123.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yunjian Zhang

Liang Liu

Hua Bao

Journals

Cancer Research

Actions

Institutions

Fudan University

Sun Yat-sen University

Guangzhou Medical University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Abstract 1123: Machine learning classifier for cancer type identification via multi-feature genome-wide cfDNA profiling

Key Result

Key Points

Study Design

Structured PICO

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study