What question did this study set out to answer?

This work aims to assess the Boolean reasoning capabilities of large language models by utilizing a deterministic verification framework.

June 12, 2026Open Access

Deterministic Boolean Verification Catches What LLMs Miss: A Hallucination Benchmark

Key Points

This work aims to assess the Boolean reasoning capabilities of large language models by utilizing a deterministic verification framework.
Developed the Boolean Algebra Engine for evaluating Boolean reasoning in LLMs.
Conducted a benchmark study across seven LLMs on Boolean satisfiability tasks with ground truth verification.
Analyzed hallucination rates and reasoning biases in LLMs based on varying expression complexities.
Identified consistent model-specific reasoning failures in LLMs, showcasing optimism and pessimism biases.
Hallucination rates remained stable across increasing variable counts.
Results provide exact correctness guarantees through machine verification.

Abstract

This technical note presents Boolean Algebra Engine, an open-source deterministic verification framework for evaluating Boolean reasoning in large language models (LLMs). The work combines a formal Boolean logic engine with an LLM-assisted translation layer, enabling natural-language Boolean queries to be converted into machine-verifiable expressions and evaluated with exact correctness guarantees. The paper includes a benchmark study of seven LLMs on Boolean satisfiability tasks with machine-verified ground truth, measuring hallucination rates across varying expression complexities. Results reveal consistent model-specific reasoning failures, including optimism and pessimism biases, and suggest that reasoning errors remain relatively stable across increasing variable counts within the tested range. The repository includes the full paper, benchmark methodology, experimental results, and implementation details of the Boolean Algebra Engine. Keywords: Large Language Models, Boolean Logic, Formal Verification, Symbolic Reasoning, Hallucination Analysis, Quine-McCluskey, SAT Reasoning, Neurosymbolic Systems.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper