This article considers matrix multiplication in the problem of finding the transitive closure of a binary relation with the transitivity property, as well as in the construction of the reachability and counter-reachability matrices in general graphs. An analysis of approaches to practical implementation for finding the transitive closure of a binary relation is presented: the Floyd-Warshall algorithm and raising the adjacency matrix to a power until it stabilises. The problem of processing large (thousands to millions of elements) graph diagrams of parallel algorithms on a processor (CPU), and the primary methods for optimising matrix calculations at both the software (algorithmic) and hardware levels, are considered. The main types of digital devices based on the parallel-pipeline data-processing principle are identified, and their advantages and disadvantages are outlined. A specialised computing device for fast multiplication of square binary matrices of size n × n is considered, whose distinctive feature is pipelining the data read operation from a specialised multiport memory. A mathematical model and a method for organising the parallel-pipeline memory of a specialised square binary matrix multiplication device are presented. An estimate of the matrix-processing time and hardware complexity for the developed and prototype devices is presented. Computational experiments showed that, despite a slightly higher hardware complexity (up to 8.8×) than the prototype device, the proposed device multiplies square binary matrices of size n ≤ 512 up to 52.4× faster. This represents a significant advantage when implemented in a semi-custom design using field-programmable gate arrays or a custom design based on application-specific integrated circuits. In this paper, we present a novel systolic device whose core innovation is a pipelined multiport memory architecture. By ensuring a continuous, high-bandwidth data flow to the processing elements, our contribution enables the systolic array to operate at its theoretical peak performance.
Najajra et al. (Mon,) studied this question.