ABSTRACT Depth estimation has been widely applied in the field of computer vision, primarily using unsupervised deep neural networks, which often rely on deeper neural networks. However, the addition of layers can result in slower convergence and suboptimal performance. To overcome these issues, we introduce a novel architecture employing model distillation, wherein a teacher network enhances the learning process of a preceding student network. To improve network speed, we integrate an ordinal module in the decoder of the teacher network for weight normalization. This module can classify weights and filter out those with the lowest information content. After the weight classification is completed, as the category value increases, the necessity of useful information decreases accordingly. Furthermore, we incorporate a residual stratification module, which adapts 2D image feature extraction methods to 3D depth, facilitating finer, multi‐scale feature representation, to expand the receptive field size at each layer of the network, thereby enhancing the accuracy and robustness of depth estimation. Experimental results using the publicly available KITTI dataset demonstrate that the proposed method accelerates network training compared to the benchmark algorithm, reducing the relative squared error by 2.3% and the root‐mean‐square error by 3.3%, thus validating the effectiveness of our approach.
Kuang et al. (Thu,) studied this question.