Vision-only driving perception systems are highly sensitive to illumination variations, particularly under low-light conditions where reduced contrast and structural degradation impair detection and segmentation accuracy. Rather than treating enhancement as a post-processing step, this work investigates the system-level impact of relocating low-light enhancement to the FPGA-based front end within a heterogeneous FPGA–ARM architecture. A hardware-accelerated visual pipeline is designed to perform color space conversion, fixed-point convolutional enhancement, and multi-channel fusion prior to high-level perception on the ARM processor. Experimental results demonstrate that the proposed FPGA-based front-end enhancement introduces only 13 ms of additional processing latency, which executes in parallel with the preceding frame’s neural network inference and therefore imposes zero net overhead on the end-to-end pipeline. In contrast, an equivalent software-based back-end enhancement approach would add its full processing time serially to the inference stage, increasing total system latency proportionally. The system achieves a sustained throughput of 58 fps while supporting real-time multi-task perception including lane detection (YOLOPv2, 539 ms per frame), object detection and emergency braking (YOLOv5, 432 ms per frame), and hardware-level multi-camera synchronization.
Xie et al. (Sun,) studied this question.