What question did this study set out to answer?

The aim is to identify a combination of biomarkers for early detection of rare cancers, particularly ovarian cancer, using robust methods.

February 19, 2026Open Access

Identifying Predictive Combinations of Biomarkers for Early Cancer Detection with Stability Selection in Combination with Ensemble Learning

Key Points

The aim is to identify a combination of biomarkers for early detection of rare cancers, particularly ovarian cancer, using robust methods.
Utilized stability selection to prevent overfitting in biomarker discovery.
Integrated ensemble learning techniques for prediction including random forest, logistic regression, LDA, and SVM.
Applied the method to simulated data and real ovarian cancer data from PLCO trials.
The STABEL technique exhibited strong predictive performance compared to existing methods.
Achieved improvements or similar results in sensitivity, specificity, and prediction accuracy.
Showed high area under the curve (AUC) metrics for the selected biomarkers.

Abstract

Abstract Certain rare cancers such as ovarian or pancreatic cancer would benefit if detected early at a stage when they are resectable. Unfortunately, approved biomarkers for these cancers are not adequate for screening the general population, and it is unlikely that a single marker will meet the performance criteria for screening. Determining a combination of biomarkers for early detection of rare cancers is a challenge. Often model selection suffers from overfitting in the discovery phase, which leads to poor performance upon validation. Since ovarian cancer has a poor prognosis, we aim to identify biomarkers that perform robustly in early cancer detection discovery and validation phases. Stability selection methods have been used to prevent overfitting and to reliably select truly expressed biomarkers. Ensemble learning methods provide robust prediction results in the face of model misspecification. We present a novel framework with a biomarker selection stage with stability selection and prediction stage using an ensemble of machine learning (ML) methods, namely the stability selection ensemble learning (STABEL). The ensemble consists of random forest (RF), logistic regression (LR), linear discriminant analysis (LDA), and support vector machine (SVM). Simulated data, along with ovarian cancer data from Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trials are used to test the methodology. Our results demonstrate the power of integrating ensemble learning with stability selection to select cancer biomarkers and achieve strong predictive performance. The STABEL technique outperforms or performs similarly to the existing methods in terms of sensitivity, specificity, prediction accuracy, and area under the curve.

Ask AI

Helpful

Bookmark

View Full Paper