Abstract Certain rare cancers such as ovarian or pancreatic cancer would benefit if detected early at a stage when they are resectable. Unfortunately, approved biomarkers for these cancers are not adequate for screening the general population, and it is unlikely that a single marker will meet the performance criteria for screening. Determining a combination of biomarkers for early detection of rare cancers is a challenge. Often model selection suffers from overfitting in the discovery phase, which leads to poor performance upon validation. Since ovarian cancer has a poor prognosis, we aim to identify biomarkers that perform robustly in early cancer detection discovery and validation phases. Stability selection methods have been used to prevent overfitting and to reliably select truly expressed biomarkers. Ensemble learning methods provide robust prediction results in the face of model misspecification. We present a novel framework with a biomarker selection stage with stability selection and prediction stage using an ensemble of machine learning (ML) methods, namely the stability selection ensemble learning (STABEL). The ensemble consists of random forest (RF), logistic regression (LR), linear discriminant analysis (LDA), and support vector machine (SVM). Simulated data, along with ovarian cancer data from Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trials are used to test the methodology. Our results demonstrate the power of integrating ensemble learning with stability selection to select cancer biomarkers and achieve strong predictive performance. The STABEL technique outperforms or performs similarly to the existing methods in terms of sensitivity, specificity, prediction accuracy, and area under the curve.
Das et al. (Tue,) studied this question.