Objective: Missing data is one of the main problems in logistic regression analysis. Imputation is a popular way to solve this problem. Multiple imputations by chained equations (MICE) is widely used because it does not depend on any distribution (e.g., multivariate normal distribution). In MICE, there are 5 multiple imputation methods. In this study, we compare the 5 methods for binary logistic regression with missing covariates. Material and Methods: We evaluated the performance of the 5 methods by generating data from multivariate distribution in R programming language. We first generated a design matrix from a multivariate normal distribution with an n sample size and a p number of independent variables (N(μ,Σ)). We generated the response variable from a Bernoulli distribution with an n sample size. We deleted 10%, 20%, and 30% of the complete data under missing completely at random and missing at random. We simulated 1,000 repetitions. Results: After implementing different scenarios in a simulation study, MICE using linear regression with bootstrap (MICEBOOT) has the least biased results and gives the lowest mean square error (MSE) in most of the scenarios. MICE-random forest has the most biased results and yields the highest MSE. Conclusion: Because there is no study related to the comparison of 5 methods in MICE for logistic regression with missing covariates, we could not compare the results of this study to the results of the previous studies. MICE-BOOT can be used for binary logistic regression with missing data.
Tuncay Yanarateş (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: