Objective: The objective of this study was to evaluate whether persona-routing strategies that combine a LEAN persona with a SAFE persona reduce rule-defined contraindication and sequence safety failures, as well as counterfactual monotonicity violations, while preserving efficiency in simulated emergency clinical reasoning. The study also aimed to determine whether such routing can reduce explicit safety failures and monotonicity violations while keeping resource use close to the lean baseline. Methods: Using a single ChatGPT deployment (GPT-5. 2 Pro; accessed January 2026; San Francisco, CA: OpenAI), we collected Japanese-language persona outputs for 28 synthetic emergency vignettes (56 scenario-level runs) and eight Base/Worse counterfactual pairs (16 comparisons per strategy). We compared four personas (high/low time pressure × LEAN/SAFE) and three routing strategies that escalated from high-time-pressure LEAN persona (PHL) to high-time-pressure SAFE persona (PHS) via red flags, dual-run auditing, and optional arbitration. Outputs were constrained to structured JavaScript Object Notation (JSON) and automatically scored for test suggestions, discharge safety-net specificity (0-5), contraindication/sequence safety violations (severity 0-3), and monotonicity violations. Routing outcomes were evaluated as deterministic offline simulations over stored persona outputs; accordingly, reported call counts are simulated expected large language model (LLM) calls rather than separately logged live controller calls. Results: The lean baseline suggested the fewest tests (mean: 1. 95) but produced safety violations in 8/56 scenarios (14. 3%) and monotonicity violations in 10/16 comparisons (62. 5%). SAFE personas had 0/56 safety violations and 0/16 monotonicity violations, but suggested more tests (means: 3. 38-4. 32). Routing eliminated safety violations and reduced monotonicity violations (ROUTERR1 2/16; ROUTERR2CF 0/16) while keeping test counts near the lean baseline in the main router simulation (means: 2. 14-2. 21) with modest simulated call overhead (ROUTERR1 1. 21 calls; ROUTERR2AUDIT 2. 00 calls). Conclusions: In this single-model, synthetic evaluation, persona routing was associated with fewer rule-defined safety violations (14. 3-0%) and monotonicity violations (62. 5% with LEAN prompting vs. 12. 5% with ROUTERR1 and 0% with ROUTERR2CF) while preserving low test-suggestion counts.
Yuusuke Harada (Wed,) studied this question.