This record contains the v1. 0 preprint and Overleaf-ready reproducibility package for AKRM-Bench: A Reproducible Evaluation Protocol for Graded Instability and Proper Exit in Hallucination Control. AKRM-Bench is a benchmark and reporting protocol for evaluating hallucination-aware language model inference under graded uncertainty. Building on AKRM-RIR, the protocol treats hallucination control as a decision problem under uncertainty rather than a purely output-level factuality task. It defines a unified action space consisting of Answer, Clarify, Refuse, and Proper Exit, and evaluates controller behavior across answerable, unanswerable, ambiguous, and hallucination-trap prompts. The protocol specifies formal metrics including the epistemic reliability score μₜ, the instability functional K (μ) =4μ (1−μ), refusal rate, Proper Exit accuracy, hallucination-risk score, answer utility, calibration error, and latency overhead. It also defines benchmark design principles, baseline decoding strategies, AKRM controller variants, ablation structure, reporting templates, failure-mode analysis, calibration diagnostics, trace logs, and reproducibility-package requirements. This release includes the preprint PDF, LaTeX source, BibTeX references, README, license information, and an Overleaf-compatible repository zip. The manuscript is a protocol and reporting framework; it does not claim definitive empirical performance results. Numerical benchmark results should be reported only after running fixed evaluation splits with documented models, thresholds, annotation procedures, and trace logs. Version: v1. 0Status: Preprint / Not peer reviewedArtifact type: Evaluation protocol and Overleaf-ready reproducibility packageCode status: Protocol repository structure included; full benchmark execution scripts intended for future releaseLicense: CC-BY 4. 0 for paper, LaTeX, documentation, and benchmark protocol materials; MIT License recommended for future code components
ENES AKIN (Wed,) studied this question.