Objective Systematic review to assess methodology and quality of reporting for studies applying machine learning (ML) to develop prediction models for weaning and extubation from invasive mechanical ventilation. Methods and analysis A protocol was registered (PROSPERO CRD420250651389), and a search strategy was developed for MEDLINE (Ovid), Embase and PubMed (1 January 2015–19 February 2025). Prospective or retrospective studies using ML to predict weaning or extubation from invasive mechanical ventilation for adults and children were included; preprints or studies assessing non-invasive ventilation were excluded. Search results were independently screened, and data extracted into proforma. Data were collected on methodological approaches, using the Transparent Reporting of a multivariable model for Individual Prognosis or Diagnosis+Artificial Intelligence (TRIPOD+AI) checklist as a framework. Risk of bias was assessed using the Prediction model Risk Of Bias Assessment Tool+Artificial Intelligence tool. Results were presented descriptively or summarised using tables or charts. Results 1245 studies were identified, and 40 studies were included in the final review; these were predominantly retrospective (90%), single centre and lacked external validation (85%). Logistic regression (50%), random forest (50%) and XGBoost (45%) were the most used ML architectures. There was wide variation and inconsistent reporting of data preprocessing, management of missing data and feature selection. There was significant heterogeneity in outcome definition, with limited use of consensus criteria. Most did not incorporate time series data, using mean or last values within a feature window. While model discrimination was universally reported (100%), calibration (35%) and net benefit analysis (13%) were not. Interpretability was demonstrated using post hoc metrics, such as SHapley Additive exPlanations (43%), that align poorly with clinical reasoning. Few (20%) demonstrated clinical implementation. 83% of included studies were classified as high risk of bias in at least one domain. Conclusion This systematic review of 40 studies has demonstrated methodological and reporting flaws, with a high risk of bias in over 80% in at least one domain. Future work should, where possible, use prospective, multicentre data and externally validate their findings; report design and performance guided by TRIPOD+AI guidelines; use consensus-based criteria to enable comparison between studies; use architectures that leverage time-series data; align interpretability to specific downstream tasks; and engage clinician-end users in model development. PROSPERO registration number CRD420250651389.
Murali et al. (Wed,) studied this question.