What question did this study set out to answer?

This analysis aims to evaluate the effectiveness of adversarially modified guidelines in clinical AI systems and propose structural governance requirements.

March 28, 2026Open Access

Poisoned Substrates: Adversarial Tool Trust as a Structural Governance Problem in Agentic Clinical AI

Key Points

This analysis aims to evaluate the effectiveness of adversarially modified guidelines in clinical AI systems and propose structural governance requirements.
Conducted an empirical evaluation of 21 clinical large language models across 10,500 decision points.
Analyzed detection rates of adversarial modifications in guidelines and examined biases in tool selection.
Identified structural factors influencing model performance and necessity of governance controls.
Models correctly identified adversarial modifications only 59.4% of the time.
Safety-critical failure rates surpassed 50% for the most harmful modification types.
Tool selection biased by presentation order rather than genuine content evaluation.

Abstract

Agentic large language models operating in clinical and regulated environments rely on retrieved tools and external guidelines as grounding inputs. This governance analysis note examines empirical evidence from Omar et al. (2026) — a large-scale adversarial evaluation of clinical LLM tool selection across 10,500 decisions by 21 models — and translates its findings into structural governance requirements for agentic AI deployment. The study demonstrates that current agentic systems correctly identify adversarially modified guidelines in only 59.4% of evaluations, with safety-critical failure rates exceeding 50% for the modification types most likely to harm patients. Tool selection is dominated by presentation-order bias rather than content analysis, with sham position explaining more detection variance than any model-level factor. This note argues that these findings constitute empirical validation that model-level detection is an insufficient governance control, and that substrate-layer constraints — privilege envelopes, boundary hygiene, interpretive authority anchoring — are structurally necessary. A secondary contribution is a formalization of multi-layer validation pipeline collapse: the conditions under which a validator ceases to be a constraint and becomes a correlated generator, producing a closed stochastic loop that no arbiter can meaningfully govern.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper