What question did this study set out to answer?

This study aimed to assess whether combining LEAN and SAFE persona-routing strategies can lower safety and monotonicity violations in emergency reasoning while maintaining efficiency.

April 25, 2026Open Access

Persona Routing Associated With Fewer Safety and Monotonicity Violations in Simulated Emergency Large Language Model (LLM) Reasoning

Key Points

This study aimed to assess whether combining LEAN and SAFE persona-routing strategies can lower safety and monotonicity violations in emergency reasoning while maintaining efficiency.
Evaluated 28 synthetic emergency vignettes using ChatGPT outputs in a simulated environment.
Compared four personas with different strategies for routing outputs and addressing safety.
Scores were given based on test suggestions, safety violations, and monotonicity violations using JSON format.
LEAN persona led to safety violations in 14.3% of scenarios and 62.5% monotonicity violations.
SAFE personas achieved 0% safety and monotonicity violations, suggesting more tests (3.38-4.32 on average).
Routing strategies reduced monotonicity violations to 12.5% and eliminated safety violations while maintaining low test suggestions (2.14-2.21).

Abstract

Objective: The objective of this study was to evaluate whether persona-routing strategies that combine a LEAN persona with a SAFE persona reduce rule-defined contraindication and sequence safety failures, as well as counterfactual monotonicity violations, while preserving efficiency in simulated emergency clinical reasoning. The study also aimed to determine whether such routing can reduce explicit safety failures and monotonicity violations while keeping resource use close to the lean baseline. Methods: Using a single ChatGPT deployment (GPT-5. 2 Pro; accessed January 2026; San Francisco, CA: OpenAI), we collected Japanese-language persona outputs for 28 synthetic emergency vignettes (56 scenario-level runs) and eight Base/Worse counterfactual pairs (16 comparisons per strategy). We compared four personas (high/low time pressure × LEAN/SAFE) and three routing strategies that escalated from high-time-pressure LEAN persona (PHL) to high-time-pressure SAFE persona (PHS) via red flags, dual-run auditing, and optional arbitration. Outputs were constrained to structured JavaScript Object Notation (JSON) and automatically scored for test suggestions, discharge safety-net specificity (0-5), contraindication/sequence safety violations (severity 0-3), and monotonicity violations. Routing outcomes were evaluated as deterministic offline simulations over stored persona outputs; accordingly, reported call counts are simulated expected large language model (LLM) calls rather than separately logged live controller calls. Results: The lean baseline suggested the fewest tests (mean: 1. 95) but produced safety violations in 8/56 scenarios (14. 3%) and monotonicity violations in 10/16 comparisons (62. 5%). SAFE personas had 0/56 safety violations and 0/16 monotonicity violations, but suggested more tests (means: 3. 38-4. 32). Routing eliminated safety violations and reduced monotonicity violations (ROUTERR1 2/16; ROUTERR2CF 0/16) while keeping test counts near the lean baseline in the main router simulation (means: 2. 14-2. 21) with modest simulated call overhead (ROUTERR1 1. 21 calls; ROUTERR2AUDIT 2. 00 calls). Conclusions: In this single-model, synthetic evaluation, persona routing was associated with fewer rule-defined safety violations (14. 3-0%) and monotonicity violations (62. 5% with LEAN prompting vs. 12. 5% with ROUTERR1 and 0% with ROUTERR2CF) while preserving low test-suggestion counts.

Bookmark

View Full Paper

Cite This Study

Yuusuke Harada (Wed,) studied this question.

synapsesocial.com/papers/69ec59c688ba6daa22dab70d https://doi.org/https://doi.org/10.7759/cureus.107548

Bookmark

View Full Paper