Targetless LiDAR–camera extrinsic calibration remains challenging due to unreliable cross-modal correspondences and sensitivity to initialization. We present a targetless extrinsic calibration framework based on class-agnostic boundary mask alignment in a shared image-plane representation. This scheme first constructs consistent LiDAR–camera mask pairs from image-plane depth and intensity projections of LiDAR data and camera images. It then obtains robust initial pose candidates through bounded rotation-only global initialization and refines them using a computationally efficient stochastic gradient approximation to estimate the optimal extrinsic parameters. Experiments on the KITTI benchmark demonstrate a superior accuracy–runtime trade-off compared with a segmentation-based global optimization baseline, while real-world driving tests confirm stable cross-modal alignment under vibration and inter-modal timing jitter.
Jeong et al. (Fri,) studied this question.