A transparent, end-to-end pathway from learning-level training to deployable fixed-point hardware is presented and framed as gradients to gates. A didactic XOR convolutional network is first employed so that backpropagation, post-training quantization in INT8, and fixed-point arithmetic can be made concrete and verified with exact checks. The same methodology was applied to a compact LeNet-5 case study. On the software side, the training-to-export flow was formalized, and a bit-accurate Python reference was constructed for the quantized network. On the hardware side, a synthesizable INT8 datapath was implemented in Verilog, including multiply–accumulate units , sigmoid activation stages, and per-layer requantization with rounding and saturation. Test benches are provided so that the exported weights and activations can be ingested, and layer-wise matches can be reported. A co-simulation harness was used to coordinate framework inference, quantization, file conversion, HDL simulation, and regression checks, which enabled deterministic comparisons of the activations, partial sums and outputs. The complete loop was mapped to Artix-7 on the CMOD A7 development board, and the resource usage, maximum clock frequency, inference latency, and throughput were determined. The approach aligns with an educational HDL-to-Caffe pipeline by using reusable parameterized Verilog primitives for convolution, pooling, activation, and fully connected layers, training in Colab with AccDNN, Caffe, quantization, and an automated bit-for-bit verification regime before FPGA synthesis. Methodological contributions are provided, including a minimal and auditable XOR CNN that exposes scales, shifts, and saturation; a practical quantization recipe with INT32 accumulation and unit tests that guarantee agreement within one least significant bit between RTL and the INT8 reference; and a scalable mapping to LeNet-5 using a row-stationary and line-buffered dataflow on an Artix-7 FPGA. Empirical evidence shows feasibility at 100 MHz with representative utilization, millisecond-scale latency and zero mismatches across large test sets, which validates the quantization configuration and the verification strategy.
Shanker et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: