What question did this study set out to answer?

The aim is to develop resource-efficient architectures for 2D convolution using kernel separability techniques.

February 14, 2026Open Access

High-performance 2D convolution architectures based on kernel separability

Key Points

The aim is to develop resource-efficient architectures for 2D convolution using kernel separability techniques.
Developed two architectures: folded separable convolution and register minimized folded separable convolution.
Implemented preprocessing techniques, including retiming, to achieve timing closure in convolutions.
Minimized register usage to enhance efficiency without increasing complexity.
Tested architectures on various FPGA devices such as Artix-7 and Virtex-7.
Achieved significant reductions in power dissipation and memory bandwidth compared to existing architectures.
Reduced hardware utilization and improved timing performance.
Slight decrease in throughput observed, balanced by lower resource requirements.

Abstract

Abstract Two-dimensional (2D) convolution is a low-level processing algorithm used in numerous image processing applications. The significant computing overhead associated with the algorithm often limits the kernel size used in the operation. Kernel separability is one of the techniques used to reduce the computational complexity of 2D convolution. This paper presents two novel resource-efficient 2D convolution architectures based on kernel separability and folding transformation: folded separable convolution architecture and register minimized folded separable convolution architecture. The precedence constraints between the different functional units in the convolution architecture necessitate some pre-processing before the folding transformation so that the top-level functionality is not compromised. This mainly involves retiming the inherent convolution architecture to ensure a complete timing closure of the folded architecture. While retiming enables the architecture to be clocked at higher frequencies, it often has an associated register overhead. To avoid this, the register minimization technique is used novelly, such that the number of registers is reduced to a minimum without increasing the complexity of the architecture. In addition, proposed architectures do not use any line buffer to store interim results, considerably decreasing the on-chip resources and power consumption. The proposed architectures are implemented on different FPGA devices such as Artix-7, Virtex-7, Virtex-4, CycloneV, and ZynQ-7000. Compared to existing architectures, the proposed architectures show a significant reduction in power dissipation, memory bandwidth, hardware utilization, and timing at the expense of a slight decrease in throughput.

Bookmark

View Full Paper

Bookmark

View Full Paper

High-performance 2D convolution architectures based on kernel separability

Key Points

Abstract

Cite This Study