The deployment of Convolutional Neural Networks (CNNs) on entry-level Edge FPGAs is severely constrained by the scarcity of Digital Signal Processing (DSP) blocks, a phenomenon termed the “DSP Wall”. To circumvent this bottleneck, this paper presents AEMAC, a Software–Hardware Co-Designed accelerator architecture that decouples arithmetic computation from DSP availability. The proposed methodology synergizes a software-level Dynamic Integer Scaling strategy with a hardware-level Adaptive Error-Compensated Multiply-Accumulate unit. By mapping floating-point activations to an optimal integer domain and employing a DSP-free, LUT-based tri-mode datapath, the architecture achieves extreme resource efficiency. To mitigate the precision loss inherent in logic-based truncation, a statistical bias compensation mechanism is integrated into the accumulator chain. Experimental validation on a Xilinx Zynq-7020 FPGA demonstrates a strictly zero-DSP implementation with minimal logic utilization (100 LUTs). Post-implementation timing simulations confirm a dynamic power of 0.490 W for a 64-core cluster under worst-case random workloads, yielding a verified energy efficiency of 26.1 GOPS/W. Micro-level analysis confirms a 16.7% reduction in arithmetic Mean Absolute Error (MAE) compared to naive truncation. Furthermore, macro-level evaluation on the CIFAR-10 dataset reveals that the co-design strategy recovers system accuracy to 64.74%, outperforming the uncompensated baseline by 0.55% and achieving statistical comparability to floating-point baselines. To ensure absolute internal consistency, all hardware metrics are strictly validated via SAIF-based post-implementation simulations. Based on a conservative full-chip projection that incorporates a routing derating model, these internally consistent results establish AEMAC as a highly scalable and reliable solution for breaking the DSP wall in resource-constrained edge intelligence.
Liu et al. (Fri,) studied this question.