What does this research mean for the field?

The AXI DMA-based architecture exhibits 54% higher hardware resource utilization and greater latency compared to the BRAM-based architecture in SoC-FPGA designs. Novelty: ClaimNovelty.CONFIRMATORY. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The research aims to compare two integration strategies for SoC-FPGA designs: AXI-DMA and BRAM, focusing on latency and resource usage.

March 1, 2026Open Access

Latency and Resource Trade-off Analysis of AXI-DMA and BRAM Integration Approaches on SoC-FPGA

Key Points

The research aims to compare two integration strategies for SoC-FPGA designs: AXI-DMA and BRAM, focusing on latency and resource usage.
Comparative evaluation of AXI-DMA and BRAM architectures on Xilinx Zynq-7000 SoC
Implementation of a custom processing element for arithmetic operations in both designs
Assessment of data transfer mechanisms and hardware resource utilization on the XC7Z020 device
Performance analysis of latency for both architectures
AXI DMA architecture utilized approximately 54% more hardware resources than the BRAM design
Average latency for AXI DMA was ~1.19 ms, while BRAM achieved ~0.10 ms reduction
Total execution times were 32,487 µs for BRAM and 359,919 µs for AXI DMA
Trade-off highlighted between scalability of AXI DMA and efficiency of BRAM in low-latency applications

Abstract

This paper presents a comparative evaluation of two integration strategies for the Xilinx Zynq-7000 System-on-Chip (SoC): an Advanced eXtensible Interface-Direct Memory Access (AXI-DMA)-based architecture and a Block RAM (BRAM)-based architecture. Both designs employ a custom processing element (PE) for arithmetic operations, yet they differ significantly in data transfer and buffering mechanisms. In the AXI DMA design, communication between the processing system (PS) and programmable logic (PL) is achieved via an AXI4-Stream interface controlled by a DMA engine. In contrast, the BRAM-based design uses dual-port block memories via AXI BRAM controllers, enabling direct operand access. Implementation results indicate that both designs comfortably meet the resource constraints of the XC7Z020 device. However, the AXI DMA-based architecture exhibits higher hardware resource utilization, with average consumption approximately 54% greater than that of the BRAM-based design. Performance analysis reveals a pronounced latency difference: the AXI DMA design required an average of ~1.19 ms per operation. In comparison, the BRAM-based approach achieved a reduction of ~0.10 ms, resulting in a total execution time of 32,487 µs compared to 359,919 µs.These findings demonstrate a clear trade-off between scalability and latency. While AXI DMA provides flexibility and throughput for stream-oriented applications, BRAM-based integration delivers superior efficiency in small-scale, low-latency scenarios. The study offers practical insights for guiding the design of Field-Programmable Gate Array (FPGA)-based accelerators on heterogeneous computing platforms.

Bookmark

View Full Paper

Cite This Study

Güner Tatar (Fri,) studied this question.

synapsesocial.com/papers/69a3d830ec16d51705d2ed63 https://doi.org/https://doi.org/10.62520/fujece.1790038

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper