What question did this study set out to answer?

This paper aims to propose a standardized framework for assessing AI system coherence.

June 14, 2026Open Access

ScoringStandardᵥ3conformance

Key Points

This paper aims to propose a standardized framework for assessing AI system coherence.
Introduces the Cognitive Multi-scale Coherence Index (CMCI) as a framework.
Defines structural coherence parameters for AI systems with severity bands.
Analyzes existing metrics and frameworks like HELM, HarmBench, and SOCRATES for supporting evidence.
CMCI captures coherence dimensions not visible in current metrics.
Framework offers a normalized scoring structure for AI coherence evaluation.
Improved risk communication and system governance outlined as potential outcomes for AI systems.

Abstract

As artificial intelligence systems become increasingly complex, interconnected, and autonomous, the limitations of existing evaluation metrics become more apparent. Current benchmarks and safety evaluations primarily assess output quality, task performance, or behavioral compliance, but they do not provide a standardized way to measure structural coherence. This creates a critical gap: systems may perform well on benchmarks while remaining fragile, drifting over time, or exhibiting incoherence across interacting scales. This paper proposes the Cognitive Multi-scale Coherence Index (CMCI) as a candidate standardized scoring framework for AI coherence. Inspired by the role of the Common Vulnerability Scoring System (CVSS) in cybersecurity, CMCI is introduced as a shared language for assessing, comparing, and communicating coherence-related system risk. The framework defines coherence as a multi-scale and transversal property of system integrity, proposes a normalized scoring structure with severity bands, a conformance specification that defines what any implementation must produce, and a calibration protocol for the bands. Building on prior work on Dynamic Coherence Windows and Cognitive Immune Protection, this paper positions CMCI not only as an analytical framework but as the basis for a common coherence scoring system. We outline its conceptual foundations, proposed scoring logic, candidate severity levels, and motivating evidence from three benchmark-adjacent analyses (HELM, HarmBench, and SOCRATES), each showing that structural coherence captures a dimension not visible through existing metrics alone. The goal of this paper is not to claim a finalized universal standard, but to establish the need, structure, and initial methodological basis for one. A standardized coherence score could improve evaluation transparency, risk communication, and system governance in AI, while providing a practical foundation for future calibration and cross-domain adoption.

ScoringStandardᵥ3conformance

Key Points

Abstract

Cite This Study