The quest to multiply two large matrices as fast as possible is one that has already intrigued researchers for several decades. However, the `optimal’ algorithm for a certain problem size is still not known. The fast matrix multiplication (FMM) problem can be formulated as a non-convex optimization problem—more specifically, as a challenging tensor decomposition problem. In this work, we build upon a state-of-the-art augmented Lagrangian algorithm, which formulates the FMM problem as a constrained least squares problem, by incorporating a new, generalized cyclic symmetric (CS) structure in the decomposition. This structure decreases the number of variables, thereby reducing the large search space and the computational cost per iteration. The constraints are used to find practical solutions, i.e., decompositions with simple coefficients, which yield fast algorithms when implemented in hardware. For the FMM problem, usually a very large number of starting points are necessary to converge to a solution. Extensive numerical experiments for different problem sizes demonstrate that including this structure yields more ‘unique’ practical decompositions for a fixed number of starting points. Uniqueness is defined relative to the known scale and trace invariance transformations that hold for all FMM decompositions. Making it easier to find practical decompositions may lead to the discovery of faster FMM algorithms when used in combination with sufficient computational power. Lastly, we show that the CS structure reduces the cost of multiplying a matrix by itself.
Vermeylen et al. (Tue,) studied this question.