What question did this study set out to answer?

The aim is to develop an effective model for classifying acute myeloid leukemia risk using multi-omics data.

April 18, 2026Open Access

A CCA-RFE-Based Multi-Omics Framework for Acute Myeloid Leukemia (AML) Risk Classification

Key Points

The aim is to develop an effective model for classifying acute myeloid leukemia risk using multi-omics data.
Proposed a new machine learning model called CCA-RFE selector (CCARS).
Utilized canonical correlation analysis (CCA) to find correlated features across omics data.
Applied recursive feature elimination (RFE) to select the most informative features.
Evaluated using nested cross-validation and performance metrics like AUC-ROC, PR-AUC, accuracy, and F1-score.
Achieved 90% classification accuracy and an F1-score of 0.85.
Outperformed baseline methods including PCA, lasso regression, and CCA.
Framework validated on an independent gene expression dataset to confirm generalization.

Abstract

The leukemia subtype/risk prediction is still a major problem because multi-omics data are highly dimensional and heterogeneous. To overcome this problem, this paper suggests a new machine learning model, namely CCA-RFE selector (CCARS), to effectively combine multi-omics data and to select features. The proposed scheme involves the canonical correlation analysis (CCA) to identify correlated features between layers of omics and recursive feature elimination (RFE) to progressively narrow down on the most informative features. The evaluation of the framework uses publicly available leukemia multi-omics data acquired at TCGA-LAML and GEO (GSE37642). The evaluation of performance is done through nested cross-validation through AUC-ROC, PR-AUC, accuracy, and F1-score. The experimental findings indicate that the suggested CCARS framework is prone to better performance in contrast with baseline methods such as PCA, lasso regression, and CCA. In particular, CCARS scored 90% classification accuracy and an F1-score of 0.85, compared to the existing models, and with reasonable computation time. The findings show that the framework proposed is validated on an independent gene expression dataset to assess partial generalization and can be used to diagnose AML risk classification and discover biomarkers.

Bookmark

View Full Paper

Bookmark

View Full Paper

A CCA-RFE-Based Multi-Omics Framework for Acute Myeloid Leukemia (AML) Risk Classification

Key Points

Abstract

Cite This Study