To address the accuracy issues in monocular depth estimation caused by insufficient feature extraction and inadequate context modeling, a multi-scale feature optimization model named EFDepth was proposed to improve prediction performance. This framework adopted an encoder–decoder structure: the encoder (EC-Net) was composed of MobileNetV3-E and ETFBlock, and its features were optimized through multi-scale dilated convolution; the decoder (LapFA-Net) combined the Laplacian pyramid and the FMA module to enhance cross-scale feature fusion and output accurate depth maps. Comparative experiments between EFDepth and algorithms including Lite-mono, Hr-depth, and Lapdepth were conducted on the KITTI datasets. The results show that, for the three error metrics—RMSE (Root Mean Square Error), AbsRel (Absolute Relative Error), and SqRel (Squared Relative Error)—EFDepth is 1.623, 0.030, and 0.445 lower than the average values of the comparison algorithms, respectively, and for the three accuracy metrics, it is 0.052, 0.023, and 0.011 higher than the average values of the comparison algorithms, respectively. Experimental results indicate that EFDepth outperforms the comparison methods in most metrics, providing an effective reference for monocular depth estimation and 3D reconstruction of complex scenes.
Liu et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: