The explosive growth of temporal graph data has led to significant training overheads for Dynamic Graph Neural Networks (DGNNs), a bottleneck primarily driven by massive data movement between host processors and storage arrays across conventional PCIe I/O buses. While near-data processing with Computational Storage Devices (CSDs) can alleviate this bottleneck, a single CSD is inherently incapable of meeting the terabyte-scale capacity requirements and complex sequence modeling demands of modern large-scale DGNNs. Horizontal scaling with multi-CSD clusters over standard PCIe topologies presents a viable, cost-effective solution, yet our in-depth profiling identifies two critical architectural bottlenecks in naive multi-CSD architectures: host-bounced memory copies significantly compromise inter-device communication efficiency, and sparse graph sampling frequently exceeds the capacity of the tightly constrained local DRAM of CSDs, resulting in excessive flash I/O and performance degradation. To address these interconnected bottlenecks, we propose M-DGNN, a hardware–software co-designed architecture optimized for standard PCIe interconnects. First, M-DGNN orchestrates direct peer-to-peer (P2P) DMA dataflows for inter-CSD hidden state exchange, completely bypassing host operating system intervention and reducing the context-switching overhead. Second, we design a host-assisted caching strategy with a Host-Pinned Memory Extension (HPME) mechanism, which leverages host-pinned memory as an asynchronous DMA extension pool to shield resource-constrained CSDs from high-latency flash I/O during structural subgraph sampling. Extensive experimental evaluations across seven large-scale dynamic graph datasets demonstrate that M-DGNN delivers up to a 6.2× end-to-end speedup over the state-of-the-art DGNN systems. This work establishes an efficient, scalable near-data computing paradigm for large-scale DGNN training.
Building similarity graph...
Analyzing shared references across papers
Loading...
Junhao Zhu
Xiaotong Han
Wenqing Wang
Electronics
National University of Defense Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhu et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69df2b49e4eeef8a2a6b0435 — DOI: https://doi.org/10.3390/electronics15081620