Big data driven predictive modeling using integrated datasets demonstrated comparable accuracy to baseline small data models for predicting 30-day readmission risk in CHF over millions of records.
Developing holistic predictive modeling solutions for risk prediction is extremely challenging in healthcare informatics. Risk prediction involves integration of clinical factors with socio-demographic factors, health conditions, disease parameters, hospital care quality parameters, and a variety of variables specific to each health care provider making the task increasingly complex. Unsurprisingly, many of such factors need to be extracted independently from different sources, and integrated back to improve the quality of predictive modeling. Such sources are typically voluminous, diverse, and vary significantly over the time. Therefore, distributed and parallel computing tools collectively termed big data have to be developed. In this work, we study big data driven solutions to predict the 30-day risk of readmission for congestive heart failure (CHF) incidents. First, we extract useful factors from National Inpatient Dataset (NIS) and augment it with our patient dataset from Multicare Health System (MHS). Then, we develop scalable data mining models to predict risk of readmission using the integrated dataset. We demonstrate the effectiveness and efficiency of the open-source predictive modeling framework we used, describe the results from various modeling algorithms we tested, and compare the performance against baseline non-distributed, non-parallel, non-integrated small data results previously published to demonstrate comparable accuracy over millions of records.
Zolfaghar et al. (Tue,) conducted a other in Congestive heart failure (CHF). Big data driven predictive modeling vs. Baseline non-distributed, non-parallel, non-integrated small data models was evaluated on 30-day risk of readmission. Big data driven predictive modeling using integrated datasets demonstrated comparable accuracy to baseline small data models for predicting 30-day readmission risk in CHF over millions of records.