Odometry estimation, which calculates the trajectory of a moving object across timeframes, is a critical and time-consuming function in SLAM (Simultaneous Localization and Mapping) systems. Although LiDAR-based sensing is most popular for outdoor and long-range applications because of its ranging accuracy, the sparsity of laser point cloud poses a significant challenge to feature extraction and matching in odometry estimation. In this paper, we investigate odometry estimation from two aspects, i.e., algorithm optimization, and system design/implementation. In algorithm optimization, we present an image feature-assisted odometry estimation scheme that leverages the richness of image information captured by a companion camera to enhance the accuracy of laser point cloud matching. This also serves as a screening mechanism to reduce the matching size and lower the computing complexity for a higher estimation rate. In addition, various schemes, such as adaptive threshold in image feature point selection, principal component analysis (PCA)-based plane fitting for laser point interpolation, and Gauss–Newton optimization for calculating the transform matrix, are also employed to improve the accuracy of odometry estimation. The performance of improved odometry estimation is verified using an existing FLOAM (Fast Lidar Odometry and Mapping) framework. The KITTI dataset for autonomous vehicles with ground truth was used as the test bench. Simulation results indicate that the translation error and rotation error can be reduced by 16.6% and 1.3%, respectively. Computing complexity, measured as the software execution time, also reduced by 63%. In system implementation, a hardware/software (HW/SW) co-design strategy was adopted, where complexity profiling was first conducted to determine the task partitioning and time-consuming tasks are offloaded to a hardware accelerator. This facilitates real-time execution on a resource-constrained embedded platform consisting of a microprocessor module (Raspberry Pi) and an attached FPGA board (Pynq Z2). Efficient hardware designs for customized DSP functions (adaptive threshold and PCA) were developed in an FPGA capable of completing one data frame in 20ms. The final system implementation met the target throughput of 10 estimations per second, and can be scaled up further.
Li et al. (Thu,) studied this question.