This work presents a two-stage autoencoder architecture for improving depth estimation in Autonomous Mobile Robot (AMR) applications by distilling Apple’s DepthPro model and integrating LiDAR data. The work addresses critical limitations in existing depth estimation technologies, particularly when applied to warehouse robotics, where accurate depth perception is essential for tasks like pallet picking and placing. The two-stage autoencoder combines the strengths of RGB-based depth estimation with sparse but accurate LiDAR measurements. The first stage involves knowledge distillation of the Apple DepthPro model to maintain structural integrity while creating a more efficient architecture suitable for mobile robots (ResNet18, ResNet50, MobileNetV2, Swin-T, ViT-B-16, and MobileNetV3-S). The second stage incorporates LiDAR point clouds projected to image space, in the loss function, to align depth estimation with real-world geometric measurements while preserving the structural integrity from the first stage. The two-stage architecture explores three variants of autoencoder designs with different multimodal fusion strategies: Variant I uses three independent encoders processing RGB, depth, and segmentation data simultaneously; Variant II employs two encoders handling bimodal pairs (RGB with depth or RGB with segmentation); and Variant III serves as a single encoder baseline using only RGB or depth data. Each variant is evaluated with both direct concatenation and attention-based feature fusion mechanisms. Evaluation was carried out with real-world data collected in a warehouse environment, where various combinations of architecture variants, fusion strategies, and loss function combinations were evaluated. The reported results demonstrate improvements in accuracy, perceptual quality, and robustness across varying scenes and lighting conditions, using the proposed two-stage approach. • Two-Stage depth estimation autoencoder architecture. In the first stage the depth estimation model is distilled from Apple’s DepthPro for structural geometry integrity. Second stage refines the estimated depth with accurate metric (LiDAR) values via fine-tuning, considering depth consistency and structural geometry. • Real-world evaluation using a dataset acquired with an AMR with onboard LiDAR and camera sensors, in a industrial warehouse.
Abreu et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: