Missing value imputation is a routine step in biomedical data analysis, yet techniques are often not tailored to specific datasets. We propose a systematic framework for selecting imputation methods customized for the unique characteristics of cross-sectional numerical data, with a focus on pain-related biomedical research. This approach generates artificial "diagnostic" missing values by randomly removing entries, allowing for direct assessment of reconstruction accuracy across various algorithms. We introduce two novel classes of diagnostic reference methods: pseudo or "poisoned" imputation methods, which intentionally introduce bias into the imputation, and "calibrating" imputations, which inject controlled random noise for objective evaluation. The framework was tested on synthetic datasets and four biomedical datasets, primarily focusing on pain-related data, employing 29 different imputation methods. Quantitative outputs, including root median square deviation (RMSD), median difference (MD), relative bias, and method categorization, facilitate a comprehensive assessment of imputation quality. The framework consistently identifies the most suitable imputation technique for each dataset, revealing that multivariate methods generally outperform univariate approaches. Benchmarking against poisoned and calibrated references establishes quantifiable thresholds for acceptable imputation errors, while also identifying instances where reliable imputations are unattainable. This systematic framework offers practical and reproducible guidelines for imputing missing values in biomedical contexts, particularly in pain research. By empowering researchers to make informed decisions about imputation, the framework enhances data integrity and the robustness of subsequent analyses. Its model-agnostic nature allows for the integration of various imputation methods, with an automated implementation available in the open-source R package "opImputation."
Lötsch et al. (Thu,) studied this question.