AI agents deployed in enterprise environments are increasingly governed by behavioral scoring systems that detect anomalous activity by comparing current behavior against historical baselines. We identify and formally characterize a novel attack class — adversarial baseline poisoning (ABP) — in which a compromised agent gradually shifts its own behavioral baseline over time, causing a behavioral scoring system to accept progressively malicious behavior as normal. We demonstrate that score-only detection fails against this attack class (20% detection rate on a synthetic financial services corpus), and present a two-layer defense architecture that achieves 100% detection across ten multi-day attack scenarios at 0.00% false positive rate on a 150-scenario legitimate behavior corpus (bounding FP below 2.4% at 95% CI, Clopper-Pearson). The defense is implemented in the open-source AgentRepEngine runtime enforcement system.
masood et al. (Sun,) studied this question.