Los puntos clave no están disponibles para este artículo en este momento.
Abstract Objective Although machine learning (ML) holds significant potential to transform healthcare, there has been a recent surge in research output that often lacks methodological rigor, contributing to a reproducibility crisis. Additionally, the growing reliance on electronic health records (EHR) for developing ML models has heightened concerns about patient data privacy. To tackle these challenges, we have extended the open-source i2b2 (Informatics for Integrating Biology and the Bedside) platform to allow researchers to train and run ML models without requiring manual programming or direct access to patient-level data. Materials and Methods We have developed a proof-of-concept ML module for the i2b2 platform for creating and executing ML models. We describe the design of the module and demonstrate its use on a publicly available Kaggle dataset. Next we test its scalability on a large real-world dataset. Results Model training with EHR of 100,000 patients randomly selected from the MIMIC-IV dataset was completed in 75.8 minutes and the developed model was applied to classify 28,985 patients in 1.61 minutes. Discussion Implementation of the ML functionalities of the i2b2-ML module was successfully evaluated with a publicly available dataset. The developed module allows seamless training and execution of ML models without the need for manual programming and export of patient-level data, thus addressing many of the challenges associated with data privacy and reproducibility. Conclusion In summary, the developed i2b2-ML module can reduce the technical overhead for researchers for applying ML to health data. Future work will focus on improving the i2b2 graphical interface to further simplify the use of the ML module and on streamlining the distribution of the ML module to existing i2b2 installations, so researchers can more easily analyze EHR data that exists in their current installations.
Klann et al. (Tue,) studied this question.