What question did this study set out to answer?

To develop a resource-efficient MAC architecture that overcomes DSP constraints for edge AI applications.

April 12, 2026Open Access

Breaking the DSP Wall: A Software–Hardware Co-Designed, Adaptive Error-Compensated MAC Architecture for Efficient Edge AI

Puntos clave

To develop a resource-efficient MAC architecture that overcomes DSP constraints for edge AI applications.
Designed AEMAC architecture with co-designed software and hardware components.
Utilized dynamic integer scaling and adaptive error compensation techniques.
Implemented on a Xilinx Zynq-7020 FPGA with zero-DSP utilization.
Validated results through SAIF-based post-implementation simulations.
Achieved energy efficiency of 26.1 GOPS/W with minimal logic use (100 LUTs).
Reduced arithmetic Mean Absolute Error by 16.7% compared to naive truncation.
System accuracy improved to 64.74% on CIFAR-10, surpassing baseline performances.

Resumen

The deployment of Convolutional Neural Networks (CNNs) on entry-level Edge FPGAs is severely constrained by the scarcity of Digital Signal Processing (DSP) blocks, a phenomenon termed the “DSP Wall”. To circumvent this bottleneck, this paper presents AEMAC, a Software–Hardware Co-Designed accelerator architecture that decouples arithmetic computation from DSP availability. The proposed methodology synergizes a software-level Dynamic Integer Scaling strategy with a hardware-level Adaptive Error-Compensated Multiply-Accumulate unit. By mapping floating-point activations to an optimal integer domain and employing a DSP-free, LUT-based tri-mode datapath, the architecture achieves extreme resource efficiency. To mitigate the precision loss inherent in logic-based truncation, a statistical bias compensation mechanism is integrated into the accumulator chain. Experimental validation on a Xilinx Zynq-7020 FPGA demonstrates a strictly zero-DSP implementation with minimal logic utilization (100 LUTs). Post-implementation timing simulations confirm a dynamic power of 0.490 W for a 64-core cluster under worst-case random workloads, yielding a verified energy efficiency of 26.1 GOPS/W. Micro-level analysis confirms a 16.7% reduction in arithmetic Mean Absolute Error (MAE) compared to naive truncation. Furthermore, macro-level evaluation on the CIFAR-10 dataset reveals that the co-design strategy recovers system accuracy to 64.74%, outperforming the uncompensated baseline by 0.55% and achieving statistical comparability to floating-point baselines. To ensure absolute internal consistency, all hardware metrics are strictly validated via SAIF-based post-implementation simulations. Based on a conservative full-chip projection that incorporates a routing derating model, these internally consistent results establish AEMAC as a highly scalable and reliable solution for breaking the DSP wall in resource-constrained edge intelligence.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo