What question did this study set out to answer?

The aim is to enhance 2D convolution hardware modules for embedded object detection by optimizing resource usage and maintaining performance.

April 22, 2026Open Access

An FPGA-Based Scalable High-Performance 2D Convolution

Key Points

The aim is to enhance 2D convolution hardware modules for embedded object detection by optimizing resource usage and maintaining performance.
Introduced resource optimization strategies including temporal and spatial memory sharing.
Proposed a new method for aligning weights using rotational displacement across kernel units.
Evaluated the approach through a case study on pedestrian detection using support vector machine.
Reduced memory, logical elements, and registers by more than 50% compared to non-optimized solutions.
Achieved significant resource savings (almost 25%) while maintaining high detection performance.
Achieved full HD resolution with 14 levels of the image pyramid, outperforming existing SVM-based detectors.

Abstract

Embedded object detection systems demand 2D convolution hardware modules that consume less processing and storage resources and process frames in high performance and high resolution. Existing solutions address performance, resource, and accuracy issues isolatedly. This work introduces resource optimization strategies for processing 2D convolution modules, such as temporal and spatial memory sharing between kernel units. This work proposes a new strategy for aligning weights between units using rotational displacement, which allows the division of the same memory into several kernel units. In a case study of pedestrian detection based on support vector machine (SVM), the proposed solution reduced the amount of memory, logical elements, and registers by more than half compared to non-optimized solutions. The proposed strategies achieved significant results using the image pyramid, decreasing almost a quarter of the overall resources. The new proposed strategy did not reduce the detector’s performance since it did not interrupt the processing flow. The proposed solution reached 14 levels of the image pyramid and full HD resolution, with results in accuracy, processing performance, resource occupancy, and power dissipation higher than existing SVM-based pedestrian detectors. Adopting these strategies can provide promising results in embedded deep-learning models.

An FPGA-Based Scalable High-Performance 2D Convolution

Key Points

Abstract

Cite This Study

Also Consider

Also Consider