What question did this study set out to answer?

This work aims to create a standardized benchmark for measuring the quality of human–AI collaboration.

February 5, 2026Open Access

Adaptive Intelligence Systems (AIS): A Replicable Interaction-Level Benchmark for Measuring Human–AI Collaboration Quality v1.0

Key Points

This work aims to create a standardized benchmark for measuring the quality of human–AI collaboration.
Introduced the AIS HCI Benchmark for interaction-level evaluation.
Utilized observable process variables such as scalarization latency and interaction cost.
Emphasized controlled, blind, and cross-system evaluation using transcript-level data.
Demonstrated measurable regularities in interaction behavior across different AI systems.
Found that traditional accuracy benchmarks do not capture these collaboration dynamics.

Abstract

Existing large language model benchmarks primarily evaluate model capability in isolation and provide limited visibility into how systems behave during real-time collaboration with human operators. Within the Interactive Intelligence Systems (IIS) framework, Adaptive Intelligence Systems (AIS) function as the measurement layer responsible for instrumenting and empirically evaluating interaction behavior. This work introduces the AIS HCI Benchmark, a replicable interaction-level evaluation protocol that operationalizes human–AI collaboration quality through observable process variables including scalarization latency, interaction cost, convergence dynamics, and adaptive expressive bandwidth. The benchmark emphasizes controlled, blind, and cross-system execution using only transcript-level observations, avoiding reliance on proprietary telemetry or model internals. Results demonstrate that interaction behavior exhibits stable, measurable regularities across heterogeneous systems and that these properties are not captured by traditional accuracy-based benchmarks. The contribution is methodological rather than competitive: the framework provides a lightweight, replication-friendly instrument for measuring collaboration dynamics at the system level. By treating interaction as the primary unit of analysis, the AIS benchmark supports cumulative comparison, reproducibility, and evidence-based evaluation of human–AI systems deployed in real-world settings.

Adaptive Intelligence Systems (AIS): A Replicable Interaction-Level Benchmark for Measuring Human–AI Collaboration Quality v1.0

Key Points

Abstract

Cite This Study

Also Consider

Also Consider