Key points are not available for this paper at this time.
MOTIVATION: Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples obtained in different settings. Cross-validation within exemplary datasets may not adequately reflect performance in the broader application context. METHODS: We develop and implement a systematic approach to 'cross-study validation', to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets. We illustrate it via simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene-expression datasets, where the objective is predicting distant metastasis-free survival (DMFS). We computed the C-index for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation. RESULTS: Our data-driven simulations and our application to survival prediction with eight breast cancer microarray datasets, suggest that standard cross-validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation. AVAILABILITY: The survHD: Survival in High Dimensions package (http://www.bitbucket.org/lwaldron/survhd) will be made available through Bioconductor.
Building similarity graph...
Analyzing shared references across papers
Loading...
Christoph Bernau
Ludwig-Maximilians-Universität München
Markus Riester
Université Paris-Sud
Anne‐Laure Boulesteix
Zimmer Biomet (Netherlands)
Bioinformatics
Boston University
Dana-Farber Cancer Institute
City University of New York
Building similarity graph...
Analyzing shared references across papers
Loading...
Bernau et al. (Wed,) studied this question.
synapsesocial.com/papers/6a128edfa4bed3c7b1674f3f — DOI: https://doi.org/10.1093/bioinformatics/btu279