This paper describes a multimodal sensor fusion developed on the Raspberry Pi platform for real-time object detection and distance estimation. This architecture has a camera and a 24 GHz mmWave radar sensor for achieving the vision and a range sensing. A pretrained YOLO model is used for identifying classes such as persons from live video frames to carry out real-time object detection. The radar provides the distance measurement for which a 1-D linear Kalman filter is applied to get a smooth and accurate estimate, and then fused with the camera data. The result of this experiment showed that the fused system offers significantly higher stability in object detection and distance estimation as compared to single sensor readings. A final configuration where radar measurement is activated only when the detected object is at the centre of the frame, which achieved near-accurate results with less noise. The proposed system is lightweight and also cost-effective for real-time perception for low-cost embedded applications in autonomous vehicles and intelligent surveillance.
K et al. (Wed,) studied this question.