What question did this study set out to answer?

The research aims to create a clear pathway for transforming neural network training into fixed-point FPGA hardware.

March 10, 2026Open Access

Integer Intelligence: A Reproducible Path from Training to FPGA

Key Points

The research aims to create a clear pathway for transforming neural network training into fixed-point FPGA hardware.
Utilized an XOR convolutional network for verifying training processes.
Applied post-training quantization techniques to convert models to INT8.
Implemented a synthesizable INT8 datapath in Verilog for hardware deployment.
Using co-simulation harnesses, ensured accurate comparisons of neural network outputs.
Achieved effective FPGA mapping on Artix-7 with a clock frequency of 100 MHz.
Demonstrated millisecond-scale inference latency with high throughput.
Validated zero mismatches across large test sets confirming the quantization strategy.

Abstract

A transparent, end-to-end pathway from learning-level training to deployable fixed-point hardware is presented and framed as gradients to gates. A didactic XOR convolutional network is first employed so that backpropagation, post-training quantization in INT8, and fixed-point arithmetic can be made concrete and verified with exact checks. The same methodology was applied to a compact LeNet-5 case study. On the software side, the training-to-export flow was formalized, and a bit-accurate Python reference was constructed for the quantized network. On the hardware side, a synthesizable INT8 datapath was implemented in Verilog, including multiply–accumulate units , sigmoid activation stages, and per-layer requantization with rounding and saturation. Test benches are provided so that the exported weights and activations can be ingested, and layer-wise matches can be reported. A co-simulation harness was used to coordinate framework inference, quantization, file conversion, HDL simulation, and regression checks, which enabled deterministic comparisons of the activations, partial sums and outputs. The complete loop was mapped to Artix-7 on the CMOD A7 development board, and the resource usage, maximum clock frequency, inference latency, and throughput were determined. The approach aligns with an educational HDL-to-Caffe pipeline by using reusable parameterized Verilog primitives for convolution, pooling, activation, and fully connected layers, training in Colab with AccDNN, Caffe, quantization, and an automated bit-for-bit verification regime before FPGA synthesis. Methodological contributions are provided, including a minimal and auditable XOR CNN that exposes scales, shifts, and saturation; a practical quantization recipe with INT32 accumulation and unit tests that guarantee agreement within one least significant bit between RTL and the INT8 reference; and a scalable mapping to LeNet-5 using a row-stationary and line-buffered dataflow on an Artix-7 FPGA. Empirical evidence shows feasibility at 100 MHz with representative utilization, millisecond-scale latency and zero mismatches across large test sets, which validates the quantization configuration and the verification strategy.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper