What question did this study set out to answer?

This research aims to evaluate the performance of a fixed-denominator integer transformer model and project its capabilities at scale.

May 21, 2026Open Access

VDR-Zig Q16 Integer LLM: Performance Baseline and Datacenter Projection for Fixed-Denominator Integer Transformer Inference

Puntos clave

This research aims to evaluate the performance of a fixed-denominator integer transformer model and project its capabilities at scale.
Implemented a single-block transformer model in Zig with Q16 integer arithmetic.
Conducted benchmarks on a 2019 Intel Core i7 laptop, measuring forward pass and training times.
Projected performance improvements under SIMD vectorization, GPU integer cores, and datacenter conditions.
Achieved 688 ns per forward pass and 1.42 million tokens per second during inference.
Verification tests confirmed model determinism and softmax accuracy.
Projected performance parity with conventional float inference methods while maintaining precision.

Resumen

We benchmark a single-block transformer language model implemented in Zig using Q16 fixed-denominator integer arithmetic (D = 2¹6 = 65536). The implementation uses no floating-point operations, no heap allocations, and no SIMD intrinsics. On a 2019 laptop (Intel Core i7-10th gen class, single core, scalar execution), the model achieves 688 ns per forward pass, 1, 159 ns per training step, and 1. 42 million tokens per second for greedy generation. All 5 verification tests pass including bit-identical determinism and exact softmax sum-to-one. From this scalar baseline, we project performance under SIMD vectorization, GPU integer tensor cores, and datacenter-scale deployment, comparing directly against conventional float16/float32 and quantized INT8 inference at each level. The central finding is that VDR Q16 arithmetic maps to the same hardware instructions as quantized integer inference — widening multiply-accumulate with right-shift epilogue — placing it at computational parity with INT8/INT16 quantization while providing stronger precision guarantees.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo