General Matrix Multiplication (GEMM) is a fundamental computational kernel in scientific computing, serving as the foundation for numerous complex tasks. However, in practical applications, the performance of GEMM is often constrained by irregular matrix dimensions and the diversity of hardware architectures. In particular, when processing small and irregular matrices, GEMM typically exhibits reduced computational efficiency. To address these challenges, this paper proposes a GEMM acceleration method based on an adaptive core grouping strategy. The method consists of two key components: a core grouping mechanism that alleviates workload imbalance among multi-core CPUs, and an adaptive block partitioning algorithm that dynamically selects optimal tiling schemes according to the matrix dimensions, achieving both load balance and cache-friendly data access. Experimental results on the Kunpeng CPU platform demonstrate that the proposed method achieves significant performance improvements compared to the Kunpeng KML math library, reaching a peak acceleration of up to 2.1× and an average speedup of 1.64×. These results validate the effectiveness and efficiency of the proposed approach in handling small and irregular matrix computation scenarios.
Zhou et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: