What does this research mean for the field?

Certain large language models (LLMs) can achieve moderate concordance with real-world triage scores in emergency department settings based on vital sign data. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to evaluate the effectiveness of large language models in assigning triage scores using vital sign data.

March 7, 2026Open Access

Large Language Models (LLM) for Emergency Department Triage Based on Vital Signs

Key Points

The aim is to evaluate the effectiveness of large language models in assigning triage scores using vital sign data.
12 widely available large language models were tested on patient triage vital sign data.
Each model assigned a triage score based solely on the vital sign data provided.
The deviation between model scores and actual triage scores was calculated and averaged for comparison.
Claude Sonnet 4.5 demonstrated the highest concordance with a 62.37% accuracy.
ChatGPT-5 Instant followed closely with 62.89% concordance.
Gemini 2.5 Flash had the lowest accuracy at 43.81% concordance.

Abstract

Introduction: Large language models (LLMs) have proven effective in many different fields, including the allocation of scarce resources. Triage within emergency departments (ED) is a core process that ensures the sickest patients are seen in a timely manner. Relatively little research has examined the use of existing LLMs in the triage process. Methods: 12 widely available LLMs were provided with real-world patient triage vital sign data from an academic trauma center in a major metropolitan area. The LLMs were asked to assign a triage score to each patient based on this information alone. The deviation between each LLM triage score and the real-world triage score for each patient was calculated, and the absolute value of the deviation was calculated and then averaged across the entire dataset per LLM. The average absolute value of deviation (AAVD) could then be used to compare LLMs against each other. All LLMs were blinded to the real-world triage score and received no additional training or instruction. Results: The models with the highest concordance with real-world triage scores were Claude Sonnet 4.5 (AAVD: 0.37; 62.37% concordance), ChatGPT-5 Instant (AAVD: 0.39; 62.89% concordance), and Claude Opus 4.1 (AAVD: 0.40; 62.37% concordance). The least accurate models were Gemini 2.5 Flash (AAVD: 0.42; 43.81% concordance), ChatGPT-4o Mini (AAVD: 0.49; 45.36% concordance), and ChatGPT-o3 (AAVD: 0.48; 48.45% concordance). Conclusions: This study analyzes the ability of LLMs to triage emergency department patients based primarily on vital sign data. Certain LLMs demonstrated moderate concordance with real-world triage scores. LLMs may be able to synthesize objective vital sign data and provide a triage recommendation. Further study could involve clinical validation against patient outcomes.

Read Full Paperexternally

AI से पूछें

Bookmark

View Full Paper

Cite This Study

Lederer et al. (Thu,) studied this question.

synapsesocial.com/papers/69abc2615af8044f7a4ebeda https://doi.org/https://doi.org/10.3390/ecm3010009

AI से पूछें

Bookmark

View Full Paper