We introduce the Coherence–Constrained Evaluation System (CCES), a reproducible benchmark designed to measure the structural viability of agents under irreversible selection pressure. Existing evaluation frameworks emphasize performance metrics such as accuracy, reward, or loss, but fail to assess whether agents maintain coherence and identity when optimization signals are unavailable, misleading, or gameable. CCES addresses this gap by evaluating agents in read-only mode under adversarial but non-competitive perturbations, measuring coherence, record support, drift, and collapse regimes. Derived from Coherence–Selection Interface Theory (CSIT), CCES operationalizes selection without providing an optimization target, rendering its metrics resistant to direct reward hacking. We demonstrate CCES across two distinct agent classes—large language models and embodied reinforcement-learning agents in Safety Gym—showing that agents with comparable performance can exhibit sharply different survivability profiles. CCES reveals latent brittleness and delayed collapse modes invisible to standard benchmarks, establishing structural viability as a distinct and necessary evaluation axis for advanced agentic systems.
Brent W. Jonah (Tue,) studied this question.