Large Language Models (LLMs) excel at fluent text generation but struggle with multi-evidence reasoning, often producing confident yet incorrect answers when faced with conflicting sources or ambiguous queries. We introduce Latent Posterior Factors (LPF), a principled fact-checking system that achieves exact uncertainty decomposition, separating epistemic uncertainty (reducible via more evidence) from aleatoric uncertainty (irreducible ambiguity). Unlike LLMs that process evidence via attention mechanisms without formal guarantees, LPF provides seven theoretical results establishing calibration preservation, generalization bounds, and information-theoretic optimality. We evaluate LPF across three settings: (1) Evidence-based comparison on a synthetic multi-evidence benchmark, where all systems use identical retrieved evidence—LPF achieves 17.0% hallucination rate with balanced 4.0% abstention, while small LLMs face a paradox: Qwen-2.5-3B achieves 0% hallucination but 92% abstention (essentially unusable), and Llama-3.2-3B achieves 6.7% but 65.3% abstention (overly cautious); (2) Parametric knowledge comparison on the same benchmark, where LPF uses evidence while baselines rely on internal knowledge—LPF maintains 17.0% hallucination versus 42.7% (Qwen) and 40.0% (Llama), demonstrating 60–58% reduction; (3) Standard benchmark evaluation on TruthfulQA, where LPF matches baseline performance (50–56% hallucination rate) while providing interpretable uncertainty signals. Experiments across 5 random seeds demonstrate stable performance (hallucination rate: 17.0 ± 0.6%, ECE: 18.6 ± 1.4%). Critically, LPF achieves these results 100–500× faster than LLM baselines (21.7 queries/second vs. 0.04–0.13 queries/second) with zero LLM calls at inference time. Our work demonstrates that exact uncertainty decomposition enables principled multi-evidence reasoning that solves the abstention paradox faced by small LLMs—achieving both low hallucination rates and practical utility—while maintaining computational efficiency and formal calibration guarantees. Keywords:Multi-evidence reasoning, uncertainty quantification, epistemic uncertainty, aleatoric uncertainty, hallucination reduction, probabilistic aggregation, variational inference, fact-checking AI, calibrated AI systems, efficient inference, LLM alternatives, interpretable AI, computationally efficient reasoning, evidence-based decision-making, neural-symbolic reasoning
Aliyu Agboola Alege (Wed,) studied this question.