What question did this study set out to answer?

To characterize adversarial baseline poisoning as a novel attack on behavioral scoring systems for AI agents.

April 14, 2026Open Access

Adversarial Baseline Poisoning: A Novel Attack Class Against Behavioral Scoring Systems for AI Agents

Key Points

To characterize adversarial baseline poisoning as a novel attack on behavioral scoring systems for AI agents.
Identified the adversarial baseline poisoning attack class.
Evaluated detection rates using a synthetic financial services dataset.
Developed a two-layer defense architecture for detection.
Tested defense across ten multi-day attack scenarios.
Score-only detection showed only 20% detection rate against the attack.
The proposed defense achieved 100% detection with a 0.00% false positive rate.
Defense maintained a false positive rate below 2.4% at 95% confidence interval.

Abstract

AI agents deployed in enterprise environments are increasingly governed by behavioral scoring systems that detect anomalous activity by comparing current behavior against historical baselines. We identify and formally characterize a novel attack class — adversarial baseline poisoning (ABP) — in which a compromised agent gradually shifts its own behavioral baseline over time, causing a behavioral scoring system to accept progressively malicious behavior as normal. We demonstrate that score-only detection fails against this attack class (20% detection rate on a synthetic financial services corpus), and present a two-layer defense architecture that achieves 100% detection across ten multi-day attack scenarios at 0.00% false positive rate on a 150-scenario legitimate behavior corpus (bounding FP below 2.4% at 95% CI, Clopper-Pearson). The defense is implemented in the open-source AgentRepEngine runtime enforcement system.

Adversarial Baseline Poisoning: A Novel Attack Class Against Behavioral Scoring Systems for AI Agents

Key Points

Abstract

Cite This Study