What type of study is this?

August 19, 2025

High-dimensional data classification: evaluating the impact of missing data imputation methods on performance

Key Points

Advanced imputation methods closely matched complete-data performance at low missing rates, achieving high accuracy.
At higher missing rates, techniques like DURR and IURR significantly outperformed other methods, indicating their robustness.
Simulated datasets with varying correlation structures were used to assess the effectiveness of multiple imputation techniques.
Real-world applications, such as breast cancer gene datasets, support the findings, highlighting the importance of reliable imputation methods.

Abstract

This study evaluates the impact of various missing data imputation methods on classification performance in high-dimensional datasets. Simulated datasets (n = 150, p = 500 and p = 1000) with different correlation structures and missing data rates (10%–50%) were analysed to compare the effectiveness of single imputation methods (mean, median, random, K-nearest neighbours (KNN), singular value decomposition (SVD)) and multiple imputation techniques (missing value imputation with random forests (I-RF), multivariate imputation by chained equations with classification and regression trees (MICE-CART), direct use of regularized regression (DURR) and indirect use of regularized regression (IURR)). Classification performance was measured using extreme learning machine (ELM), evaluated based on the area under the receiver operating characteristic curve (AUC) and balanced accuracy. Results showed that advanced methods (I-RF, MICE-CART, DURR, IURR) closely matched complete-data performance at low missing rates (10%–20%), while DURR and IURR outperformed others at higher missing rates (30%–50%). A real-world application on a breast cancer gene expression dataset further supports these findings, demonstrating that multiple imputation methods, particularly DURR and IURR, yield the most reliable classification outcomes.

Mark Helpful

Bookmark

Relay