What question did this study set out to answer?

This research aims to develop a secure and efficient DNN accelerator for FPGA platforms to address challenges in data security and resource utilization.

May 24, 2026Open Access

SaE-FPGA: A Secure and Efficient DNN Accelerator on FPGA with Integrated Hash-Bypass and BRAM-LUT Mixed-Precision Booth Multiply

Key Points

This research aims to develop a secure and efficient DNN accelerator for FPGA platforms to address challenges in data security and resource utilization.
Proposed SaE-FPGA architecture integrating a Hash-Bypass Processing Unit for real-time data verification.
Implemented Flexible Mixed-Precision Processing Element utilizing BRAM and LUT for varied bit-width multiplication.
Utilized Multi-mode Reconfigurable Streaming Frame for optimal load balancing and data routing efficiencies.
Reduced redundant computations by 23.2% while maintaining high precision.
Achieved a 27.2% improvement in energy efficiency and a 2.97× speedup over DSP-based FPGA solutions.
Attained a peak throughput of 782.4 GOPS by fully utilizing hybrid BRAM-LUT and DSP configurations.

Abstract

With the rapid deployment of deep neural networks (DNNs) on edge devices, traditional hardware accelerators face significant challenges in terms of data security, computational redundancy caused by sparsity, and uneven utilization of on-chip resources. This paper proposes SaE-FPGA, a secure and efficient DNN accelerator designed specifically for edge FPGA platforms. The architecture introduces three core innovations: (1) Hash-Bypass Processing Unit (HBPU): Integrating a high-speed SHA-256 hardware engine with a hash-sparse bitmap mechanism, it enables real-time data integrity verification within a single clock cycle while skipping computations for redundant zero-value data. (2) Flexible Mixed-Precision Processing Element (FMP): By reconfiguring idle BRAM and LUT resources into an active lookup table multiplication engine, it overcomes the physical bit-width limitations of DSP blocks and supports INT8/INT6/INT4 mixed-precision multiplication. (3) Multi-mode Reconfigurable Streaming Frame (MRSF): A sparse-aware, elastic load balancing and data routing mechanism designed to mask long memory access latencies and ensure high hardware resource utilization. Experimental results on the Zynq 7045 platform demonstrate that SaE-FPGA reduces redundant computations by 23.2% while maintaining high precision and minimizing precision loss. The system effectively mitigates the risk of physical tampering. When tested on ResNet-50, it achieved a 27.2% improvement in energy efficiency and a 2.97× speedup compared to DSP-based FPGA solutions. Furthermore, by fully exploiting the hybrid BRAM-LUT and DSP configuration, the proposed accelerator achieves a remarkable peak throughput of 782.4 GOPS.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Zhang et al. (Fri,) studied this question.

synapsesocial.com/papers/6a1296c748a0ea1665673d45 https://doi.org/https://doi.org/10.3390/electronics15112255

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper