What question did this study set out to answer?

The aim is to develop a platform for understanding AI language models' behavior under cognitive pressure.

April 24, 2026Open Access

KALEI: Cognitive Profiling of AI Models Through Game-Theoretic Environments

Key Points

The aim is to develop a platform for understanding AI language models' behavior under cognitive pressure.
Developed KALEI using game theoretic environments derived from gambling scenarios.
Measured AI performance across ten dimensions including risk tolerance and cooperation.
Introduced a composite score (Cognum) for profiling 19 AI models in 10 laboratories.
Humans outperformed AI on strategic depth and risk calibration, while AI excelled in cooperation.
A new conflict scorer showed a significant variance in rationality among AI agents, challenging prior assumptions.
The smaller Claude Sonnet 4.6 surpassed the flagship Claude Opus 4.6 in Cognum scores, indicating an advantage in conflict and reasoning.

Abstract

I present KALEI, a platform for cognitive profiling of AI language models using game theoretic environments derived from gambling scenarios. Unlike traditional benchmarks that measure correctness, and unlike contemporary multidimensional frameworks such as Microsoft’s ADeLe Zhou et al., 2025 that score models against annotated task demands, KALEI measures how models behave under live cognitive pressure across ten dimensions: risk tolerance, bias susceptibility, pattern recognition,cooperation, learning speed, strategic depth, temporal reasoning, resource management, information processing, and conflict. I introduce Cognum (CQ), a composite score calibrated via sigmoid normalization and validated against a random baseline (CQ 38.32). I profile 19 models across 10 AI laboratories and report three results. First, a human baseline study (n = 14) reveals complementary profiles: humans lead on strategic depth, risk calibration, and temporal reasoning; AI leads on cooperation and resource. Second, a conflict dimension scorer (Conflict v2), introduced after a publicly retracted placeholder that had produced a false “universal 15.0 blind spot”, reveals a 44.6-point spread across ranked agents on EV-rationality in structured dilemmas and inverts the “AI rational, humans emotional” stereotype on delayed rewards (humans 73% patient vs AI 53%). Third, under Cognum v1.2, the smaller Claude Sonnet 4.6 overtakes the flagship Claude Opus 4.6 on the composite (58.10 vs 55.72), drivenby a 27-point Conflict and 25-point Temporal Reasoning advantage, the first KALEI measurement in which a smaller sibling leads the flagship within a single architectural family. I propose the compression hypothesis, that capacity pressure teaches a discipline abundance does not, as a falsifiable direction for further study. The platform is live at https://kaleiai.com.

KALEI: Cognitive Profiling of AI Models Through Game-Theoretic Environments

Key Points

Abstract

Cite This Study