Embedded object detection systems demand 2D convolution hardware modules that consume less processing and storage resources and process frames in high performance and high resolution. Existing solutions address performance, resource, and accuracy issues isolatedly. This work introduces resource optimization strategies for processing 2D convolution modules, such as temporal and spatial memory sharing between kernel units. This work proposes a new strategy for aligning weights between units using rotational displacement, which allows the division of the same memory into several kernel units. In a case study of pedestrian detection based on support vector machine (SVM), the proposed solution reduced the amount of memory, logical elements, and registers by more than half compared to non-optimized solutions. The proposed strategies achieved significant results using the image pyramid, decreasing almost a quarter of the overall resources. The new proposed strategy did not reduce the detector’s performance since it did not interrupt the processing flow. The proposed solution reached 14 levels of the image pyramid and full HD resolution, with results in accuracy, processing performance, resource occupancy, and power dissipation higher than existing SVM-based pedestrian detectors. Adopting these strategies can provide promising results in embedded deep-learning models.
Cambuim et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: