UAV remote sensing demands efficient, large-scale mapping for critical applications such as emergency response and resource surveying. However, traditional single-agent SLAM is inherently constrained by limited flight endurance, rendering the transition to multi-UAV collaborative frameworks imperative. We propose MSRS-SLAM, a novel real-time dense reconstruction framework for UAV swarms. The pipeline utilizes visual odometry for pose estimation, followed by deep multi-view depth estimation to generate local dense maps. To resolve multi-agent consistency challenges, we introduce a robust two-stage map fusion strategy : the first stage achieves initial alignment through 3D-3D feature matching by leveraging multi-frame co-visibility, RANSAC, and the Umeyama algorithm, while the second stage employs dense point cloud registration for precise relative pose refinement. Global consistency is ensured by eliminating accumulated errors through intra-agent and inter-agent loop closure detection coupled with global Pose Graph Optimization. Furthermore, an improved SAM3 model based on text prompts is integrated, enabling multi-UAV cooperative semantic dense reconstruction. Evaluation on four real-world datasets demonstrates that MSRS-SLAM achieves high localization accuracy with an Absolute Pose Error (APE) ranging from 0.21 m to 0.70 m. In terms of efficiency, the system maintains a stable processing throughput, completing the reconstruction of 495 images within 6.34 min for the Village dataset, representing a several-fold reduction in average processing time relative to offline baselines. These results indicate that MSRS-SLAM attains an optimal balance between computational efficiency and reconstruction precision, significantly enhancing the viability of real-time collaborative mapping.
Du et al. (Thu,) studied this question.