What question did this study set out to answer?

This research aims to explore the performance advantages of integer arithmetic over floating-point computation in the VDR-LLM-Prolog system.

May 18, 2026Open Access

VDR-LLM-Prolog: Performance: Integer Arithmetic on GPU Hardware: Why Wider Operands on More Cores Outrun Narrower Operands on Fewer Passes

Key Points

This research aims to explore the performance advantages of integer arithmetic over floating-point computation in the VDR-LLM-Prolog system.
Evaluated the VDR-LLM-Prolog system's performance against conventional floating-point language models.
Analyze GPU utilization patterns in the context of integer arithmetic with wider operands and massive parallelism.
Established the efficiency of GPU mapping and outlined architectural properties supporting performance.
Achieved a token reduction of 85 to 97 percent compared to traditional models.
Established that integer arithmetic maps efficiently to GPU hardware.
Showed structurally superior GPU utilization patterns compared to irregular floating-point workloads.

Abstract

The VDR-LLM-Prolog system replaces floating-point arithmetic with exact integer computation. The immediate objection is performance: integer arithmetic on 100-digit numbers must be slower than hardware-accelerated floating-point on 16-bit or 32-bit values. This paper demonstrates that the objection confuses per-operation cost with per-prompt cost. A conventional language model spends thousands of tokens — each requiring a full forward pass through billions of floating-point parameters — on infrastructure work that VDR handles through exact primitive calls costing a few hundred integer operations each. VDR-15 established that the token reduction is 85 to 97 percent. This paper establishes that the integer arithmetic executing those primitives maps efficiently to GPU hardware, that the wider operands are offset by the massive parallelism of modern GPUs, and that several architectural properties of VDR — fixed-frame regularity, grammar-constrained decode, indexed knowledge base scans, and frontier-based Prolog execution — produce GPU utilization patterns that are structurally superior to the irregular, attention-dominated workloads of conventional language model inference. The complete GPU mapping is specified in the supplementary technical specification. This paper explains why it works, what the performance characteristics are, and where the actual bottlenecks lie.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Geoffrey Howland (Fri,) studied this question.

synapsesocial.com/papers/6a0aad2a5ba8ef6d83b70b72 https://doi.org/https://doi.org/10.5281/zenodo.20236975

Bookmark

View Full Paper