VDR-LLM-Prolog is an architecture for language models built on exact integer arithmetic, where every value is a triple Value, Denominator, Remainder of integers, data lives in scoped knowledge bases at integer addresses, and the language model orchestrates 448 deterministic primitives rather than generating computation as text. The arithmetic foundation uses Q335 — a fixed denominator of 2³35 providing 100 decimal digits of precision — where addition is one integer add and multiplication is one integer multiply followed by division by the fixed denominator. This paper presents a custom FPGA processor design that accelerates Q335 arithmetic and Prolog knowledge base operations on a Xilinx Zynq-7020 system-on-chip. The design exploits a structural property of the Q335 representation: because the denominator is a fixed power of two, division by the denominator is bit extraction — bits above position 335 are the quotient, bits below are the remainder. In digital logic, fixed-position bit extraction is wiring, not computation: zero logic cells, zero clock cycles beyond routing delay, zero power consumption. The processor implements ten VDR-Q335 cores, each with a 384-bit ALU (1-cycle addition, 9-cycle multiplication via iterative 128-bit tiling on DSP48E1 slices, 1-cycle comparison, 19-cycle cross-multiply for Prolog unification), a 2KB remainder tree in block RAM, and a 53-instruction ISA spanning wide arithmetic, remainder tree management, Prolog matching, and batch processing. A batch dispatcher distributes work across cores via DMA, and a binary reduction network combines partial results in 5 cycles. The system fits within 54. 2% of available lookup tables and 73. 4% of flip-flops, leaving headroom for timing closure and future additions. At 150 MHz, the design achieves single Prolog fact queries across 200 facts in approximately 200 nanoseconds, attention dot products for 64-dimensional vectors in approximately 6 microseconds, surrogate softmax over 100 logits in approximately 3. 3 microseconds, and SGD parameter updates in approximately 15 cycles per element. The ARM host processor runs the Zig VDR-LLM-Prolog runtime — orchestrated inference, knowledge base management, sessions, grammar-directed generation, and language model passes — while the FPGA accelerates parallel data-plane operations through a register-mapped control interface and high-bandwidth DMA. Every accelerated operation produces results bit-identical to the software implementation, verified against the project's 884-test validation suite with zero arithmetic errors. The architecture scales linearly: the same core design targets 60 cores on Zynq-7045, 120 on Zynq-7100, and 200+ on UltraScale+ devices.
Geoffrey Howland (Fri,) studied this question.