What question did this study set out to answer?

The research aims to validate the RFC-EVAL-001 protocol for cognitive profiling in artificial intelligence systems.

May 7, 2026Open Access

Cognitive Profiling in Artificial Intelligence: Reliability and Validity of the RFC-EVAL-001 Protocol

Key Points

The research aims to validate the RFC-EVAL-001 protocol for cognitive profiling in artificial intelligence systems.
Conducted evaluations on six state-of-the-art LLMs using a complete cross-evaluation design.
Assessed inter-rater reliability with intra-class correlation coefficient under a two-way random-effects model.
Employed bootstrap resampling with 10,000 iterations to confirm reliability robustness.
All four cognitive dimensions showed high reliability with ICC range of 0.79-0.88.
Aggregated reliability exceeded 0.94 across dimensions.
Differentiated cognitive profiles were observed, with epistemic calibration varying independently from overall performance.

Abstract

ABSTRACT As large language models (LLMs) continue to advance, evaluation frameworks must move beyond narrow task accuracy toward structured assessment of higher-order cognitive performance patterns. This study presents and empirically validates RFC-EVAL-001 v1.1, a multidimensional benchmarking protocol designed to assess four cognitive dimensions in artificial intelligence (AI) systems: model complexity, temporal horizon, meta-modeling, and adaptive flexibility. Six state-of-the-art LLMs participated in a complete cross-evaluation design (36 evaluations). Inter-rater reliability was assessed using the intra-class correlation coefficient (ICC(2,5); Shrout Koo & Li, 2016). All four dimensions demonstrated high reliability (ICC range: 0.79–0.88), with bootstrap resampling (10,000 iterations) confirming robustness. Aggregated reliability (ICC(2,k)) exceeded 0.94 across dimensions. Results reveal differentiated cognitive profiles across systems and show that epistemic calibration (self-assessment accuracy) varies independently from overall performance. These findings provide preliminary psychometric evidence supporting RFC-EVAL-001 as a reproducible protocol for multidimensional cognitive profiling in artificial systems, pending replication with larger and more diverse samples. The present contribution validates the measurement instrument itself and provides a methodological foundation for cumulative research on cognitive profiling in AI.

Cognitive Profiling in Artificial Intelligence: Reliability and Validity of the RFC-EVAL-001 Protocol

Key Points

Abstract

Cite This Study