I present KALEI, a platform for cognitive profiling of AI language models using game theoretic environments derived from gambling scenarios. Unlike traditional benchmarks that measure correctness, and unlike contemporary multidimensional frameworks such as Microsoft’s ADeLe Zhou et al., 2025 that score models against annotated task demands, KALEI measures how models behave under live cognitive pressure across ten dimensions: risk tolerance, bias susceptibility, pattern recognition,cooperation, learning speed, strategic depth, temporal reasoning, resource management, information processing, and conflict. I introduce Cognum (CQ), a composite score calibrated via sigmoid normalization and validated against a random baseline (CQ 38.32). I profile 19 models across 10 AI laboratories and report three results. First, a human baseline study (n = 14) reveals complementary profiles: humans lead on strategic depth, risk calibration, and temporal reasoning; AI leads on cooperation and resource. Second, a conflict dimension scorer (Conflict v2), introduced after a publicly retracted placeholder that had produced a false “universal 15.0 blind spot”, reveals a 44.6-point spread across ranked agents on EV-rationality in structured dilemmas and inverts the “AI rational, humans emotional” stereotype on delayed rewards (humans 73% patient vs AI 53%). Third, under Cognum v1.2, the smaller Claude Sonnet 4.6 overtakes the flagship Claude Opus 4.6 on the composite (58.10 vs 55.72), drivenby a 27-point Conflict and 25-point Temporal Reasoning advantage, the first KALEI measurement in which a smaller sibling leads the flagship within a single architectural family. I propose the compression hypothesis, that capacity pressure teaches a discipline abundance does not, as a falsifiable direction for further study. The platform is live at https://kaleiai.com.
Venelin Videnov (Wed,) studied this question.