Abstract Two-dimensional (2D) convolution is a low-level processing algorithm used in numerous image processing applications. The significant computing overhead associated with the algorithm often limits the kernel size used in the operation. Kernel separability is one of the techniques used to reduce the computational complexity of 2D convolution. This paper presents two novel resource-efficient 2D convolution architectures based on kernel separability and folding transformation: folded separable convolution architecture and register minimized folded separable convolution architecture. The precedence constraints between the different functional units in the convolution architecture necessitate some pre-processing before the folding transformation so that the top-level functionality is not compromised. This mainly involves retiming the inherent convolution architecture to ensure a complete timing closure of the folded architecture. While retiming enables the architecture to be clocked at higher frequencies, it often has an associated register overhead. To avoid this, the register minimization technique is used novelly, such that the number of registers is reduced to a minimum without increasing the complexity of the architecture. In addition, proposed architectures do not use any line buffer to store interim results, considerably decreasing the on-chip resources and power consumption. The proposed architectures are implemented on different FPGA devices such as Artix-7, Virtex-7, Virtex-4, CycloneV, and ZynQ-7000. Compared to existing architectures, the proposed architectures show a significant reduction in power dissipation, memory bandwidth, hardware utilization, and timing at the expense of a slight decrease in throughput.
Hassan et al. (Sun,) studied this question.