What does this research mean for the field?

In AI judgement systems, confidence scores reflect the strength of expression of a single decision rather than its behavioural stability, and therefore do not reliably indicate whether a judgement will remain consistent under repeated evaluation. Novelty: ClaimNovelty.CONTRADICTORY. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

What question did this study set out to answer?

This paper investigates whether confidence in AI judgement systems genuinely reflects reliability during repeated evaluations.

May 17, 2026Open Access

When Confidence Does Not Indicate Reliability

Puntos clave

This paper investigates whether confidence in AI judgement systems genuinely reflects reliability during repeated evaluations.
Analysis of repeated evaluations of 150 job advertisements for age-related bias.
Evaluation of how confidence scores behave under varying judgement outcomes.
Examination of the Behavioural Evaluation Framework with focus on internal signals.
Confidence scores remained stable within a narrow range (0.60–0.62) despite changing judgements in 18.7% of cases.
In some instances, confidence was higher in cases with instability than in stable evaluations.
Findings suggest confidence does not reliably indicate judgement stability, necessitating assessment of behavioral patterns.

Resumen

Description Beyond the Average Research Series – Working Paper This working paper examines confidence behaviour in AI judgement systems under repeated evaluation. It builds on the Behavioural Evaluation Framework (Hull, 2026), extending earlier work on judgement stability and non-resolution by examining whether confidence reflects underlying behavioural reliability. The analysis draws on the Phase 4 behavioural evaluation study within the Agents at Work research series (Hull, 2025–2026), which examines how large language models interpret age-coded language in recruitment text and how those judgements behave when the same evaluative task is repeated. The paper focuses on how confidence values behave when classification outcomes remain stable and when they vary under identical conditions. While confidence is commonly interpreted as an indicator of reliability, the analysis shows that confidence values remain comparatively stable even where underlying judgements change across repeated evaluation. This pattern is examined as a distinct separation between expressed certainty and behavioural stability. Rather than indicating whether a judgement remains stable across repeated runs, confidence reflects how strongly a decision is expressed in a single instance. Together with earlier findings on judgement variation and non-resolution, this work extends the behavioural evaluation framework beyond output classification to examine how internal signals behave under repeated observation. Version note – 1.0This version presents the initial working paper release examining confidence behaviour as an internal signal within AI judgement systems under repeated evaluation. Abstract Confidence scores are widely used as indicators of reliability in AI judgement systems. Higher confidence is often treated as evidence that a decision is dependable. This paper examines how confidence behaves in repeated evaluations of recruitment text. Building on the Behavioural Evaluation Framework, the analysis examines whether confidence reflects underlying judgement stability under repeated evaluation. Using repeated evaluations of 150 job advertisements for potential age-related bias, the findings show that confidence remains highly stable across runs, typically within a narrow range centred around 0.60–0.62. This stability persists even in cases where classification outcomes vary across repeated evaluation. In 18.7% of cases, judgements change under identical conditions, most often between adjacent categories such as “Potentially Biased” and “Unclear”. However, confidence does not adjust in response to this variation and may in some cases be higher in unstable cases than in stable ones. These results indicate that confidence reflects the strength of expression of a decision rather than its behavioural stability. Confidence therefore does not provide a reliable indication of whether a judgement will remain stable under repeated evaluation. Reliability must be assessed through observed behavioural patterns rather than confidence alone. Note This paper is released as a working paper to present findings on confidence behaviour within the Behavioural Evaluation Framework. It extends earlier work on judgement stability and non-resolution by examining confidence as an internal signal under repeated evaluation. Future work will examine how confidence behaviour interacts with explanation stability, cross-model comparison, and sensitivity to input variation as part of the ongoing Agents at Work research series.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo