Computer vision-aided small target detection in moving streams, such as rivers/ roads, requires a fast-converging outcome as the frame requirements are high. The bounding box varies for the multiple frames generated, resulting in low object detection precision. To address the problem of floating object detection, this article introduces a Region-Overlap Detection (ROD) method using the Minimum Convoluted YOLOv7 (MCY) architecture. First, the typical YOLO classifier identifies the largest overlap area from multiple overlapping regions. The second method extracts the largest bounding box in an area with minimal convolution in the neural network's final training layer. Both techniques accurately identify small objects in flowing streams with high mean accuracy. The YOLO architecture trains its convolutional layers using the largest overlap area, shared by many bounding box regions. The intersecting areas are removed from convolutional layers to expedite convergence and increase mAP. The proposed method achieves a high mean Average Precision (mAP) of 73.1% and a recall of 70.2% for small floating object detection in dynamic river environments.
Wang et al. (Sat,) studied this question.