Key points are not available for this paper at this time.
Depth estimation is a critical step for many computer vision tasks such as scene understanding, registration, and localization. The view synthesis-based method estimates depth in a self-supervised framework without any expensive ground truth. However, this method suffers from the so-called ill-posed problem. A general solution to ill-posed problems is to incorporate relevant constraints and regularizations. To this end, we propose a new attention module and a loss term enforcing causation between a 2-D image and the corresponding depth map. The results show that the proposed method has made overall improvements in terms of accuracy and time required for training. In particular, while converging 6 epochs faster than the base model, the model outperforms the base model, MonoDepth2, on standard metrics, e.g., by 6% on RMSlog.
Zarei et al. (Wed,) studied this question.