September 25, 2023

GLARE: Accelerating Sparse DNN Inference Kernels with Global Memory Access Reduction

Key Points

Key points are not available for this paper at this time.

Abstract

Sparse deep neural networks (DNNs) leverage sparse representations to achieve faster inference and lower memory footprint. However, deploying sparse DNNs comes with challenges, such as irregular memory access patterns, workload imbalance, etc. To address these challenges, IEEE HPEC has organized the Sparse DNN Graph Challenge (SDGC), seeking new methods from the high-performance computing community. For many years, SDGC has yielded innovative works on accelerating sparse DNN inference. However, none of them have identified redundant global memory access that contributes to significant runtime overhead. To overcome this challenge, we propose GLARE, a framework that can assist existing sparse inference kernels in effectively reducing redundant global memory access. We have applied GLARE to previous SDGC champions and a recent sparse inference engine SNICIT. Evaluated on SDGC benchmarks, we demonstrate the promising performance of GLARE and its generalizability in accelerating existing sparse inference kernels, for instance, up to 31.56 x speed-up over one of the previous SDGC champions.

Mark Helpful

Bookmark

Relay