This paper introduces a suite of exact overlays for 4x4 matrix multiplication that reduce the counted number of multiplications without altering the underlying bilinear complexity of the base algorithm, such as Strassen's 49-multiplication algorithm or the optimal 48-multiplication algorithm. We present several value-aware techniques that exploit the numerical properties of the input matrices. The primary contributions include the 'Peeler' method, a per-leaf mode subtraction technique with a provably tight cost law; 'Value-Aware Collapse', a counting model that treats multiplications by -1, 0, or 1 as free; and a 'Block Peeler' for optimal algorithms. To further amplify the effectiveness of these methods, we introduce zero-arithmetic-cost structural modifications, including 'Permutation Clustering' and 'Inner Sign-Perm Reindexing'. Additionally, we propose a 'Bit-Sliced GEMM' path for integer and fixed-point matrices that eliminates scalar multiplications entirely, replacing them with efficient bitwise operations. We derive closed-form expressions for the expected computational savings for various discrete input distributions and validate our findings with a comprehensive experimental protocol. The proposed overlays preserve the worst-case performance of the base algorithms while delivering substantial computational gains on structured, quantized, or sparse data, offering a practical path to accelerating matrix multiplication in a variety of real-world applications.
Michael Rey (Fri,) studied this question.