February 28, 2026Open Access

Imputation classes for continuous incomplete data using semi-supervised cluster analysis

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Data may not be fully observed due to various reasons.When the missing data mechanism is missing at random, the bias in imputation can be reduced by forming imputation classes based on observed variables.Imputation classes can be generated by utilizing other observed variables, however, the quantity of imputation classes can significantly increase when a substantial number of variables are taken into account.To choose only relevant small number of variables, it has been suggested to create imputation classes by utilizing tree-based algorithms.Nevertheless, when dealing with high-dimensional data, these techniques can still lead to an excessive number of imputation classes.Therefore, this study proposes to form imputation classes by employing semi-supervised clustering algorithms.Simulations based on generated data and real data were conducted to compare the proposed techniques with complete-case analysis, imputation without considering imputation classes, and imputation by utilizing tree-based algorithms.The simulation indicates that the proposed method can reduce the bias in imputed data compared to other methods, and it is feasible to effectively manage the number of imputation classes.

Me gusta

Guardar

Ver artículo completo

Cite This Study

Lee et al. (Sat,) studied this question.

synapsesocial.com/papers/6a0da1bd88250cfcc2a5099c https://doi.org/https://doi.org/10.5351/kjas.2026.39.1.035

Me gusta

Guardar

Ver artículo completo