What question did this study set out to answer?

The aim is to develop a fact-checking system that effectively distinguishes uncertainty types to improve multi-evidence reasoning.

March 26, 2026Open Access

Exact Uncertainty Decomposition for Multi-Evidence Fact Verification: A Formal Alternative to LLM-Based Reasoning

Key Points

The aim is to develop a fact-checking system that effectively distinguishes uncertainty types to improve multi-evidence reasoning.
Introduced Latent Posterior Factors (LPF) for uncertainty decomposition.
Evaluated LPF on synthetic multi-evidence benchmarks compared to LLMs.
Conducted experiments across three settings with various evidence types.
LPF achieved a hallucination rate of 17.0% with 4.0% abstention.
Demonstrated 60-58% reduction in hallucination compared to small LLMs.
LPF operates 100–500× faster than LLM baselines, processing 21.7 queries/second.

Abstract

Large Language Models (LLMs) excel at fluent text generation but struggle with multi-evidence reasoning, often producing confident yet incorrect answers when faced with conflicting sources or ambiguous queries. We introduce Latent Posterior Factors (LPF), a principled fact-checking system that achieves exact uncertainty decomposition, separating epistemic uncertainty (reducible via more evidence) from aleatoric uncertainty (irreducible ambiguity). Unlike LLMs that process evidence via attention mechanisms without formal guarantees, LPF provides seven theoretical results establishing calibration preservation, generalization bounds, and information-theoretic optimality. We evaluate LPF across three settings: (1) Evidence-based comparison on a synthetic multi-evidence benchmark, where all systems use identical retrieved evidence—LPF achieves 17.0% hallucination rate with balanced 4.0% abstention, while small LLMs face a paradox: Qwen-2.5-3B achieves 0% hallucination but 92% abstention (essentially unusable), and Llama-3.2-3B achieves 6.7% but 65.3% abstention (overly cautious); (2) Parametric knowledge comparison on the same benchmark, where LPF uses evidence while baselines rely on internal knowledge—LPF maintains 17.0% hallucination versus 42.7% (Qwen) and 40.0% (Llama), demonstrating 60–58% reduction; (3) Standard benchmark evaluation on TruthfulQA, where LPF matches baseline performance (50–56% hallucination rate) while providing interpretable uncertainty signals. Experiments across 5 random seeds demonstrate stable performance (hallucination rate: 17.0 ± 0.6%, ECE: 18.6 ± 1.4%). Critically, LPF achieves these results 100–500× faster than LLM baselines (21.7 queries/second vs. 0.04–0.13 queries/second) with zero LLM calls at inference time. Our work demonstrates that exact uncertainty decomposition enables principled multi-evidence reasoning that solves the abstention paradox faced by small LLMs—achieving both low hallucination rates and practical utility—while maintaining computational efficiency and formal calibration guarantees. Keywords:Multi-evidence reasoning, uncertainty quantification, epistemic uncertainty, aleatoric uncertainty, hallucination reduction, probabilistic aggregation, variational inference, fact-checking AI, calibrated AI systems, efficient inference, LLM alternatives, interpretable AI, computationally efficient reasoning, evidence-based decision-making, neural-symbolic reasoning

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper