Los puntos clave no están disponibles para este artículo en este momento.
Data may not be fully observed due to various reasons.When the missing data mechanism is missing at random, the bias in imputation can be reduced by forming imputation classes based on observed variables.Imputation classes can be generated by utilizing other observed variables, however, the quantity of imputation classes can significantly increase when a substantial number of variables are taken into account.To choose only relevant small number of variables, it has been suggested to create imputation classes by utilizing tree-based algorithms.Nevertheless, when dealing with high-dimensional data, these techniques can still lead to an excessive number of imputation classes.Therefore, this study proposes to form imputation classes by employing semi-supervised clustering algorithms.Simulations based on generated data and real data were conducted to compare the proposed techniques with complete-case analysis, imputation without considering imputation classes, and imputation by utilizing tree-based algorithms.The simulation indicates that the proposed method can reduce the bias in imputed data compared to other methods, and it is feasible to effectively manage the number of imputation classes.
Lee et al. (Sat,) studied this question.