What question did this study set out to answer?

This research aims to enhance the security of AI agents through community-driven contributions to an adversarial testing framework.

April 3, 2026Open Access

Community-Driven Security for AI Agents: Evolution of an Adversarial Testing Framework

Key Points

This research aims to enhance the security of AI agents through community-driven contributions to an adversarial testing framework.
Developed the Agent Security Harness with an initial 209 tests, expanding to 342 tests through community input.
Implemented manifest-based integrity checks and trust tiers to secure community plugins.
Evaluated the framework's performance through a scoring system, initially dropping and later recovering.
Achieved a final evaluation score of 10/10 after community enhancements.
Initial community plugin integration reduced the score to 6.5/10 before recovery.
Proposed a roadmap that encourages open contributions and outlines bounties for security improvements.

Abstract

The proliferation of autonomous AI agents has exposed critical security gaps, from tool poisoning to supply chain attacks, as exemplified by CVE-2026-25253. This paper traces the evolution of the Agent Security Harness, an open-source adversarial testing framework, from its initial 209 tests to a community-enhanced suite of 342 tests, culminating in a perfect 10/10 evaluation score. We detail the challenges of integrating community plugins, which initially dropped the score to 6.5/10, and the subsequent recovery through manifest-based integrity checks, trust tiers, and hardening protocols. Building on our prior work in Decision Load Index (DLI) and Constitutional Self-Governance (CSG), we propose a sustainable model for open contributions, including bounties and good-first issues. The framework's journey demonstrates how collaborative red-teaming can mitigate agent risks, aligning with AIUC-1 standards and offering a blueprint for enterprise-grade security. We outline the v4.0 roadmap and invite further participation to foster a robust, collective defense against emerging threats.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Michael Saleme

Actions

Institutions

Cognitive Research (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Community-Driven Security for AI Agents: Evolution of an Adversarial Testing Framework

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider