Recent work on double descent has challenged classical bias-variance tradeoffs, showing that test error can decrease, increase sharply near the interpolation threshold, and then decrease again as model capacity grows. This phenomenon has been documented in regression and classification, but its relevance to survival analysis remains unclear. Survival data are subject to censoring, which obscures true event times, and widely used models such as the Cox proportional hazards model are optimized via partial likelihoods that emphasize ranking rather than calibrated risk estimation. It is therefore unknown whether double descent occurs in this setting, how censoring influences its manifestation, or how it interacts with standard performance metrics.We investigate these questions using synthetic survival data generated from Weibull hazards with controlled censoring, allowing systematic variation of model capacity from under to over parameterized regimes.While we verified double descent occurs in survival models, calibration plateaus and decouples from discrimination, even under strong Formula: see text regularization. This decoupling arises because the Cox partial likelihood optimizes rankings rather than magnitudes, producing extreme risk scores that break the Breslow estimator used to estimate survival probabilities. Validation on two real-world clinical datasets, the METABRIC breast cancer cohort and the SUPPORT study of seriously ill hospitalized adults, confirms the calibration-discrimination decoupling: IBS saturates at a constant value as network width grows while concordance varies, reproducing the primary synthetic finding across distinct clinical domains and sample sizes. These results highlight limitations of discrimination-based model selection in survival analysis and underscore the need for calibration-aware evaluation in high-capacity prognostic models.
Hart et al. (Wed,) studied this question.