Sparse matrix–matrix multiplication (SpMM) is a fundamental operation in scientific computing with broad applications across numerous domains. Tiling is a key optimization technique for improving data locality and is widely adopted in high-performance computing. However, the irregular data access patterns inherent to SpMM make it challenging to exploit tiling effectively for data reuse. In this paper, we propose MaSpMM , a memory-aware SpMM framework that integrates cache-aware tiling with a segment-oriented data layout. MaSpMM stores matrices as continuous segments to enhance data locality within each tile. Moreover, since many sparse matrices in real-world applications exhibit symmetry, we further develop MaSpMM-Sym, an extension that recursively partitions symmetric matrices to eliminate write conflicts and further improve locality. To adapt to diverse scenarios, we finally introduce MaSpMM-Adap, which adaptively selects the most suitable approach for each input matrix. Comprehensive evaluations on both x86 and ARM CPUs demonstrate that MaSpMM-Adap achieves average speedups of up to 1.86 × over Intel oneMKL, 1.84 × over ASpT, and 1.75 × over J-Stream.
Bi et al. (Sat,) studied this question.