What does this research mean for the field?

Large language models, including ChatGPT-5, Gemini, and Copilot, demonstrate poor agreement with emergency physicians regarding initial mechanical ventilator settings for intubated patients in the emergency department. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This study aims to evaluate the agreement between ventilator settings recommended by AI models and those set by emergency physicians.

May 30, 2026Open Access

Artificial intelligence in emergency mechanical ventilation: a prospective observational comparison with emergency physician ventilator settings

Key Points

This study aims to evaluate the agreement between ventilator settings recommended by AI models and those set by emergency physicians.
Prospective observational study conducted at a single center; 30 intubated patients included over three months.
Ventilator settings assessed using AI models (ChatGPT-5, Gemini, Copilot) and compared to emergency physician settings.
Agreement for ventilator mode assessed with Cohen’s kappa statistics; continuous ventilator parameters evaluated using Bland–Altman analysis.
ChatGPT-5 showed 50.0% agreement (Cohen's kappa: 0.199; 95% CI: −0.087 to 0.486) with EP.
Google Gemini demonstrated 43.3% agreement (Cohen's kappa: 0.164; 95% CI: −0.098 to 0.426) with EP.
Microsoft Copilot exhibited only 20.0% agreement (Cohen's kappa: −0.043; 95% CI: −0.230 to 0.143) with EP.

Abstract

Artificial intelligence (AI) has the potential to support clinicians in high-risk and complex decision-making processes, such as mechanical ventilation. This prospective observational study aimed to compare mechanical ventilator settings determined by emergency physician (EP) with recommendations generated by three large language models (ChatGPT-5, Gemini, and Copilot) in the emergency department (ED). This prospective, analytical, single-center study included 30 intubated patients managed in an ED over a three-month period. Clinical data, including diagnoses, vital signs, and initial arterial blood gas parameters, were presented to ChatGPT-5, Gemini, and Copilot. The AI models’ recommendations for ventilation mode, tidal volume, respiratory rate, PEEP, and FiO₂ were compared with the initial settings adjusted by EP. Agreement for ventilator mode selection was assessed using Cohen’s kappa statistics, while agreement for continuous ventilator parameters was evaluated using Bland–Altman analysis. A total of 30 patients were included in the study. The median age was 73 years (IQR: 60–84), and 66.7% were male. When the ventilator setting preferences of the EP were analyzed, the most commonly used modes were VCV (46.7%) and SIMV (40.0%). Among the AI models, ChatGPT-5 primarily recommended VCV (76.7%) and, to a lesser extent, CPAP (10.0%); Gemini most frequently preferred VCV (56.7%) and PCV (43.3%); and Copilot predominantly recommended PCV (70.0%). Data on the compatibility of mechanical ventilator mode selection revealed that AI models showed ‘poor’ agreement with expert opinion (EP) based on diagnosis. ChatGPT showed 50.0% agreement (Cohen’s kappa: 0.199; 95% Confidence Interval (CI): −0.087 to 0.486), Google Gemini 43.3% agreement (Cohen’s kappa: 0.164; 95% CI: −0.098 to 0.426), and Microsoft Copilot 20.0% agreement (Cohen’s kappa: −0.043; 95% CI: −0.230 to 0.143). Agreement between AI-generated ventilator settings and the EP was limited. Current AI models may offer supportive input; however, these findings should be interpreted as preliminary and exploratory, and further large-scale, multicenter studies are needed to validate these results. Not applicable.

Mark Helpful

Bookmark

Relay

View Full Paper