In the rapidly evolving landscape of healthcare AI, clinical prediction models hold immense promise for early disease detection, patient triaging, and personalized treatment. However, the real-world clinical datasets powering these models are notoriously imperfect frequently plagued by missing values due to irregular patient monitoring, disjointed electronic health records (EHR), or human error. How we handle these data gaps can ultimately make or break a model's clinical viability. The current work examines systematically the effect that several data imputation procedures have on the efficiency, equity, and validity of predictive models. We examine a broad range of missing data handling techniques, starting from straightforward conventional ones (such as mean/median imputations), conventional statistical procedures like multiple imputation by chained equations (MICE), up to sophisticated algorithms for data imputation such as k-nearest neighbors (kNN), or even deep learning. To do this, we use a set of clinical data sets with several missing data patterns (MCAR, MAR, and MNAR) and then estimate the performance of the predictive models developed.
Roma Chaurasia , Dr. Mohammad Suaib, Dr. Manish Madhava Tripathi (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: