Exploiting matrix symmetry to halve memory footprint offers a substantial opportunity for accelerating memory-bound computations like Sparse Matrix-Vector Multiplication (SpMV). However, symmetric SpMV incurs data conflicts when concurrently writing the output vector. Previous approaches fail to address this issue efficiently, i.e., either are non-scalable or yield poor performance for large high-bandwidth irregular matrices. This paper extends DCS-SpMV , a D ivide-and- C onquer (DC) based shared-memory implementation of S ymmetric SpMV. The key idea of DCS-SpMV is to recursively divide and reorder the matrix-induced conflict graph into independent subgraphs for parallel execution, and construct separate subgraphs to avoid data conflicts. The DC approach naturally transforms the input matrix into a low-conflict part and a high-conflict part, which motivates us to design a conflict-aware hybrid solution DCH-SpMV that executes these two parts using DCS-SpMV and the standard SpMV, respectively. We also develop a machine learning model for DCH-SpMV to predict the optimal number of DC recursions on a given matrix and architecture. In this work, we further optimize the hybrid DC implementation by reducing data conflicts before the DC preprocessing. First, we present a conflict-pruning strategy to decouple certain highly dense columns or rows from the conflict graph of a symmetric matrix. Second, we implement a heuristic to adaptively select the lower or upper triangular part of a symmetric matrix, leading to fewer data conflicts. Our optimizations not only facilitate the DC preprocessing, but also improve the performance of DCH-SpMV. We evaluate our work on both x86 and ARM multi-core CPUs using 298 symmetric sparse matrices from the SuiteSparse Matrix Collection. Our new optimizations improve the performance of previous version 42 by up to 4.89 ×, demonstrating significant speedup over the state-of-the-art approaches including the vendor-tuned Intel oneMKL library.
Qiu et al. (Mon,) studied this question.