What question did this study set out to answer?

The research aims to address performance challenges in deep graph neural networks through a high-efficiency 3D-CIM accelerator.

May 22, 2026Open Access

A high-efficiency 3D-stacked accelerator for deep graph neural network inference

Key Points

The research aims to address performance challenges in deep graph neural networks through a high-efficiency 3D-CIM accelerator.
Proposed G3DMA architecture Designed via hardware-software co-optimization
Implemented differentiated compression techniques for adjacency matrices and node features
Developed a three-stage model augmented with sparsity-aware scheduling and zero-skipping
G3DMA achieved speedups of 6007.41× compared to CPUs and 106.28× against GPUs
Outperformed state-of-the-art accelerators—HyGCN, GCIM, GCNim, and SGCN—by up to 26.27×
Consistently improved both performance and energy efficiency compared to existing designs

Abstract

Three-dimensional compute-in-memory (3D-CIM) architectures, with their high bandwidth and strong parallelism, provide significant opportunities for accelerating graph neural network (GNN) inference. However, existing 3D-CIM accelerators still face two major challenges when handling deep graph neural networks (DeGNNs): (1) insufficient support for layer-wise sparsity, where zero values lead to redundant memory accesses and ineffective computations, resulting in reduced bandwidth utilization and increased latency; and (2) lack of native support for cross-layer residual dependencies, where frequent data movement incurs additional storage and communication overhead, further exacerbating inference latency. To address these issues, we propose G3DMA a high-efficiency 3D-CIM accelerator designed through hardware-software co-optimization for DeGNN inference. For sparse encoding, G3DMA employs differentiated compression: adjacency matrices are stored using Dual-Bitmap Sparse Representation (DBSR), while node features adopt a bitmap-value separated Block Sparse Representation (BSR), significantly reducing DRAM access overhead while improving compression ratio and indexing efficiency. At the execution level, we design a three-stage model—“combination-intermediate aggregation-residual accumulation”—augmented with sparsity-aware scheduling and zero-skipping, thereby avoiding full materialization of intermediate results and reducing ineffective computations. At the hardware level, G3DMA integrates lightweight compute arrays and dedicated codec units in the near-memory logic layer, efficiently supporting DBSR/BSR processing and block-wise memory accesses; it further implements a three-stage dataflow with pipelined control, residual-friendly accumulation paths, and low-overhead cross-vault routing. Experimental results demonstrate that G3DMA achieves speedups of 6007.41× and 106.28× over advanced CPU and GPU platforms, respectively. Compared with the latest state-of-the-art (SOTA) accelerators—HyGCN, GCIM, GCNim, and SGCN—G3DMA delivers 26.27×, 11.93×, 1.46×, and 2.42× performance improvements, respectively, and consistently outperforms SOTA designs in both performance and energy efficiency.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Zhenyu Long

Yu Zhang

Yutao FU

Journals

Scientia Sinica Informationis

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Long et al. (Fri,) studied this question.

synapsesocial.com/papers/6a0ff42fd674f7c03778d512 — DOI: https://doi.org/10.1360/ssi-2025-0381

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Image Inpainting : Overview and Recent Advances· 2013 · 590 citations
Object Detection With Deep Learning: A Review· 2019 · 5,314 citations
FPGA Design Methodology for Industrial Control Systems—A Review· 2007 · 899 citations
MeG<sup>2</sup>: In-Memory Acceleration for Genome Graphs Analysis· 2023 · 2 citations
ArnetMiner· 2008 · 2,135 citations

A high-efficiency 3D-stacked accelerator for deep graph neural network inference

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider