What question did this study set out to answer?

May 1, 2026Open Access

AKRM-Bench: A Reproducible Evaluation Protocol for Graded Instability and Proper Exit in Hallucination Control

Key Points

The aim is to establish a reproducible protocol for evaluating hallucination-aware language model inference under graded uncertainty.
Defines a unified action space for model responses (Answer, Clarify, Refuse, Proper Exit).
Specifies formal metrics for evaluation including reliability score, instability functional, and refusal rate.
Outlines benchmark design principles and reporting templates for reproducibility.
Numerical performance results will be included in future releases after proper evaluation handling.
No empirical performance claims are made in this protocol as definitive outcomes are not yet available.

Abstract

This record contains the v1. 0 preprint and Overleaf-ready reproducibility package for AKRM-Bench: A Reproducible Evaluation Protocol for Graded Instability and Proper Exit in Hallucination Control. AKRM-Bench is a benchmark and reporting protocol for evaluating hallucination-aware language model inference under graded uncertainty. Building on AKRM-RIR, the protocol treats hallucination control as a decision problem under uncertainty rather than a purely output-level factuality task. It defines a unified action space consisting of Answer, Clarify, Refuse, and Proper Exit, and evaluates controller behavior across answerable, unanswerable, ambiguous, and hallucination-trap prompts. The protocol specifies formal metrics including the epistemic reliability score μₜ, the instability functional K (μ) =4μ (1−μ), refusal rate, Proper Exit accuracy, hallucination-risk score, answer utility, calibration error, and latency overhead. It also defines benchmark design principles, baseline decoding strategies, AKRM controller variants, ablation structure, reporting templates, failure-mode analysis, calibration diagnostics, trace logs, and reproducibility-package requirements. This release includes the preprint PDF, LaTeX source, BibTeX references, README, license information, and an Overleaf-compatible repository zip. The manuscript is a protocol and reporting framework; it does not claim definitive empirical performance results. Numerical benchmark results should be reported only after running fixed evaluation splits with documented models, thresholds, annotation procedures, and trace logs. Version: v1. 0Status: Preprint / Not peer reviewedArtifact type: Evaluation protocol and Overleaf-ready reproducibility packageCode status: Protocol repository structure included; full benchmark execution scripts intended for future releaseLicense: CC-BY 4. 0 for paper, LaTeX, documentation, and benchmark protocol materials; MIT License recommended for future code components

AKRM-Bench: A Reproducible Evaluation Protocol for Graded Instability and Proper Exit in Hallucination Control

Key Points

Abstract

Cite This Study