What type of study is this?

This is a Quantitative Study study.

September 23, 2025Open Access

Exploiting Generalized Cyclic Symmetry to Find Fast Rectangular Matrix Multiplication Algorithms Easier

Key Points

Incorporating generalized cyclic symmetry in matrix decomposition enables faster algorithms for matrix multiplication.
Using a constrained least squares problem approach, the study demonstrates enhanced uniqueness of matrix decompositions.
Extensive numerical experiments show that fewer starting points lead to converging on optimal solutions with the new structure.
This research highlights the potential for discovering faster fast matrix multiplication algorithms with enhanced computational techniques.

Abstract

The quest to multiply two large matrices as fast as possible is one that has already intrigued researchers for several decades. However, the `optimal’ algorithm for a certain problem size is still not known. The fast matrix multiplication (FMM) problem can be formulated as a non-convex optimization problem—more specifically, as a challenging tensor decomposition problem. In this work, we build upon a state-of-the-art augmented Lagrangian algorithm, which formulates the FMM problem as a constrained least squares problem, by incorporating a new, generalized cyclic symmetric (CS) structure in the decomposition. This structure decreases the number of variables, thereby reducing the large search space and the computational cost per iteration. The constraints are used to find practical solutions, i.e., decompositions with simple coefficients, which yield fast algorithms when implemented in hardware. For the FMM problem, usually a very large number of starting points are necessary to converge to a solution. Extensive numerical experiments for different problem sizes demonstrate that including this structure yields more ‘unique’ practical decompositions for a fixed number of starting points. Uniqueness is defined relative to the known scale and trace invariance transformations that hold for all FMM decompositions. Making it easier to find practical decompositions may lead to the discovery of faster FMM algorithms when used in combination with sufficient computational power. Lastly, we show that the CS structure reduces the cost of multiplying a matrix by itself.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper