What question did this study set out to answer?

This research aims to establish a machine-readable framework for evaluating AI compliance with behavioural specifications.

March 14, 2026Open Access

Machine-Readable Behavioural Compliance Evidence for AI Systems: A Specification Profiling Framework

Key Points

This research aims to establish a machine-readable framework for evaluating AI compliance with behavioural specifications.
Developed the Specification Profiling Framework (SPF) for AI systems evaluation.
Applied a two-turn protocol to isolate specification effects from baseline behaviour.
Validated methodology with four commercial AI systems to assess compliance against specified constraints.
Compliance varied significantly among systems, ranging from 0/8 to 6/8 constraints.
Found a specification reversal anomaly indicating unobservable structural failures.
Evidence artefacts are structured as JSON and aligned with EU AI Act conformity requirements.

Abstract

Existing methods for evaluating AI behaviour conflate personality measurement with specification compliance. This paper presents the Specification Profiling Framework (SPF), a specification-verification method that produces machine-readable evidence of whether an AI system's observable output conforms to an explicit behavioural specification. SPF evaluates systems across eight behavioural constraints using a two-turn protocol that isolates specification effects from baseline behaviour. Methodology validation with four commercial AI systems reveals significant per-system variation: compliance ranges from 0/8 to 6/8 constraints. A specification reversal anomaly (D8 DomainStrictness) demonstrates that multi-dimensional separated assessment surfaces structural failures invisible to scalar scoring. All evidence artefacts are structured (JSON), reproducible, and mapped to EU AI Act conformity assessment requirements (Annex A).

Machine-Readable Behavioural Compliance Evidence for AI Systems: A Specification Profiling Framework

Key Points

Abstract

Cite This Study