This paper presents a comparative evaluation of two integration strategies for the Xilinx Zynq-7000 System-on-Chip (SoC): an Advanced eXtensible Interface-Direct Memory Access (AXI-DMA)-based architecture and a Block RAM (BRAM)-based architecture. Both designs employ a custom processing element (PE) for arithmetic operations, yet they differ significantly in data transfer and buffering mechanisms. In the AXI DMA design, communication between the processing system (PS) and programmable logic (PL) is achieved via an AXI4-Stream interface controlled by a DMA engine. In contrast, the BRAM-based design uses dual-port block memories via AXI BRAM controllers, enabling direct operand access. Implementation results indicate that both designs comfortably meet the resource constraints of the XC7Z020 device. However, the AXI DMA-based architecture exhibits higher hardware resource utilization, with average consumption approximately 54% greater than that of the BRAM-based design. Performance analysis reveals a pronounced latency difference: the AXI DMA design required an average of ~1.19 ms per operation. In comparison, the BRAM-based approach achieved a reduction of ~0.10 ms, resulting in a total execution time of 32,487 µs compared to 359,919 µs.These findings demonstrate a clear trade-off between scalability and latency. While AXI DMA provides flexibility and throughput for stream-oriented applications, BRAM-based integration delivers superior efficiency in small-scale, low-latency scenarios. The study offers practical insights for guiding the design of Field-Programmable Gate Array (FPGA)-based accelerators on heterogeneous computing platforms.
Güner Tatar (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: