What question did this study set out to answer?

The aim is to improve the efficiency of symmetric sparse matrix-vector multiplication (SpMV) while addressing data conflicts during parallel execution.

April 15, 2026Open Access

Towards Efficient Symmetric Sparse Matrix-Vector Multiplication on Multi-Cores

Key Points

The aim is to improve the efficiency of symmetric sparse matrix-vector multiplication (SpMV) while addressing data conflicts during parallel execution.
Extended DCS-SpMV implementation for symmetric SpMV using divide-and-conquer techniques.
Developed a conflict-aware hybrid solution (DCH-SpMV) to handle low and high-conflict parts separately.
Introduced a conflict-pruning strategy to optimize the conflict graph of a symmetric matrix.
Implemented heuristics for adaptive selection of triangular matrix parts for fewer conflicts.
Achieved up to 4.89× performance improvement over previous versions of the DCS-SpMV.
Demonstrated significant speedup compared to state-of-the-art methods, including Intel oneMKL library.

Abstract

Exploiting matrix symmetry to halve memory footprint offers a substantial opportunity for accelerating memory-bound computations like Sparse Matrix-Vector Multiplication (SpMV). However, symmetric SpMV incurs data conflicts when concurrently writing the output vector. Previous approaches fail to address this issue efficiently, i.e., either are non-scalable or yield poor performance for large high-bandwidth irregular matrices. This paper extends DCS-SpMV , a D ivide-and- C onquer (DC) based shared-memory implementation of S ymmetric SpMV. The key idea of DCS-SpMV is to recursively divide and reorder the matrix-induced conflict graph into independent subgraphs for parallel execution, and construct separate subgraphs to avoid data conflicts. The DC approach naturally transforms the input matrix into a low-conflict part and a high-conflict part, which motivates us to design a conflict-aware hybrid solution DCH-SpMV that executes these two parts using DCS-SpMV and the standard SpMV, respectively. We also develop a machine learning model for DCH-SpMV to predict the optimal number of DC recursions on a given matrix and architecture. In this work, we further optimize the hybrid DC implementation by reducing data conflicts before the DC preprocessing. First, we present a conflict-pruning strategy to decouple certain highly dense columns or rows from the conflict graph of a symmetric matrix. Second, we implement a heuristic to adaptively select the lower or upper triangular part of a symmetric matrix, leading to fewer data conflicts. Our optimizations not only facilitate the DC preprocessing, but also improve the performance of DCH-SpMV. We evaluate our work on both x86 and ARM multi-core CPUs using 298 symmetric sparse matrices from the SuiteSparse Matrix Collection. Our new optimizations improve the performance of previous version 42 by up to 4.89 ×, demonstrating significant speedup over the state-of-the-art approaches including the vendor-tuned Intel oneMKL library.

Towards Efficient Symmetric Sparse Matrix-Vector Multiplication on Multi-Cores

Key Points

Abstract

Cite This Study