Uncertainty quantification (UQ) for open-ended language generation remains a critical yet underexplored challenge, particularly in settings where token-level log-probabilities are available during decoding. We present Token-Entropy Conformal Prediction (TECP), which treats a log-probability-based token-entropy statistic as a nonconformity score and integrates it with split conformal prediction to construct prediction sets with finite-sample coverage guarantees. We work in a white-box regime in which per-token log-probabilities are accessible during decoding. TECP estimates episodic uncertainty from the token-entropy structure of sampled generations and calibrates thresholds via conformal quantiles to ensure provable error control. Empirical evaluations across six large language models and two QA benchmarks (CoQA and TriviaQA) show that TECP consistently achieves reliable coverage and compact prediction sets, outperforming prior self-UQ methods. These results provide a principled and efficient solution for trustworthy generation in white-box, log-probability-accessible LLM settings.
Xu et al. (Tue,) studied this question.