Key points are not available for this paper at this time.
Multi-sensor information fusion technology has been widely used in the perception of unmanned aerial vehicle environments. However, the perception accuracy needs to be improved in practice since multiple sensors have consistency limitations and fused data has limited utility. A deep learning method based on the multi-stage fusion of millimeter-wave radar and camera is proposed in this paper. In the data preprocessing stage, the radar reflection point and image pixels are fused in a Gaussian-weighted way to obtain the salient image. The salient density map of each pixel relative to the radar reflection point is calculated. Then, the threshold is set to segment the salient density map to complete visual target detection. In the detection stage based on deep learning, a network structure is designed to fuse the salient image and visual target detection images at different convolution depths. The classification, location, and size of targets are regressed by training. In the post-decoding stage, the radar reflection point is fused for local non-maximum suppression. The non-maximum suppression operation is started from the radar reflection point. Different from typical detection methods, the proposed method improves detection accuracy by fusing the feature information of the radar and camera in a multi-stage process. Experimental results demonstrated that mAP0.90 increased by 3.9% and 4.3%. For complex scenarios, mAP0.50 improved by 2.4%, mAP0.75 improved by 4.9%, and mAP0.90 improved by 6.9%, indicating that the proposed method is effective compared to the state-of-the-art model (YOLOv8).
Wang et al. (Tue,) studied this question.