What question did this study set out to answer?

The aim is to develop a device that efficiently multiplies square binary matrices for applications in graph theory.

February 2, 2026Open Access

A Specialised Computing Device for Multiplying Square Binary Matrices Based on Multiport Parallel-Pipelined Memory

Key Points

The aim is to develop a device that efficiently multiplies square binary matrices for applications in graph theory.
Analyzed matrix multiplication methods including Floyd-Warshall algorithm and adjacency matrix powers.
Identified digital devices based on parallel-pipeline data-processing principles.
Developed mathematical models for a specialized computing device featuring pipelined multiport memory.
Conducted computational experiments to compare processing times and hardware complexities.
The proposed device multiplied square binary matrices of size n ≤ 512 up to 52.4× faster than a prototype device.
The hardware complexity increased by up to 8.8× compared to the prototype, which is offset by significant performance gains.

Abstract

This article considers matrix multiplication in the problem of finding the transitive closure of a binary relation with the transitivity property, as well as in the construction of the reachability and counter-reachability matrices in general graphs. An analysis of approaches to practical implementation for finding the transitive closure of a binary relation is presented: the Floyd-Warshall algorithm and raising the adjacency matrix to a power until it stabilises. The problem of processing large (thousands to millions of elements) graph diagrams of parallel algorithms on a processor (CPU), and the primary methods for optimising matrix calculations at both the software (algorithmic) and hardware levels, are considered. The main types of digital devices based on the parallel-pipeline data-processing principle are identified, and their advantages and disadvantages are outlined. A specialised computing device for fast multiplication of square binary matrices of size n × n is considered, whose distinctive feature is pipelining the data read operation from a specialised multiport memory. A mathematical model and a method for organising the parallel-pipeline memory of a specialised square binary matrix multiplication device are presented. An estimate of the matrix-processing time and hardware complexity for the developed and prototype devices is presented. Computational experiments showed that, despite a slightly higher hardware complexity (up to 8.8×) than the prototype device, the proposed device multiplies square binary matrices of size n ≤ 512 up to 52.4× faster. This represents a significant advantage when implemented in a semi-custom design using field-programmable gate arrays or a custom design based on application-specific integrated circuits. In this paper, we present a novel systolic device whose core innovation is a pipelined multiport memory architecture. By ensuring a continuous, high-bandwidth data flow to the processing elements, our contribution enables the systolic array to operate at its theoretical peak performance.

A Specialised Computing Device for Multiplying Square Binary Matrices Based on Multiport Parallel-Pipelined Memory

Key Points

Abstract

Cite This Study