Abstract Regression analysis with missing data is a long-standing and challenging problem, particularly when there are many missing variables with arbitrary missing patterns. Likelihood-based methods, although theoretically appealing, are often computationally inefficient or even infeasible when dealing with a large number of missing variables. In this paper, we consider the Cox regression model with incomplete covariates that are missing at random. We develop an expectation-maximization (EM) algorithm for nonparametric maximum likelihood estimation, employing a transformation technique in the E-step so that it involves only a one-dimensional integration. This innovation makes our methods computationally tractable even when the number of missing variables is large. In addition, for variable selection, we extend the proposed EM algorithm to accommodate a Lasso penalty in the likelihood. We demonstrate the feasibility and advantages of the proposed methods by large-scale simulation studies and apply the proposed methods to a cancer genomic study.
Kwok et al. (Sun,) studied this question.