Image and signal processing workloads are widely deployed on Graphics Processing Units (GPUs) for high throughput and on Field-Programmable Gate Arrays (FPGAs) for hardware specialization and energy efficiency. Soft GPU overlays on FPGAs aim to combine these advantages, yet existing solutions often depend on fixed hard processors or impose platform constraints that limit portability. This work extends a popular open-source soft GPGPU overlay to integrate a soft RISC-V control plane and enable compatibility with High-Bandwidth Memory (HBM2). The resulting system can be instantiated on FPGA boards without a hard ARM processor, improving portability, simplifying system integration, and broadening deployability. Across representative image and signal processing kernels, the soft GPGPU achieves geometric-mean speedups of 114.60 × over a scalar soft RISC-V core and 19.72 × over a hard ARM core, demonstrating substantial performance benefits while retaining FPGA reconfigurability. HBM2 integration further benefits bandwidth-sensitive workloads by increasing sustained throughput and reducing the performance bottlenecks associated with off-chip memory access. Collectively, these results indicate that GPU-like programmability and performance can be delivered on reconfigurable platforms without reliance on hard CPU subsystems, providing a portable and scalable foundation for embedded vision and DSP acceleration.
Hernandez et al. (Sun,) studied this question.