Abstract Background Monocular depth estimation in endoscopic scenes is a fundamental prerequisite for intraoperative 3D reconstruction and surgical navigation. However, factors such as weak textures, specular reflections, local overexposure, and dynamic illumination changes in endoscopic environments can undermine the effectiveness of photometric-consistency-based self-supervised learning methods, leading to insufficient structural representation and limited prediction accuracy. Methods To address this issue, this paper presents a task-specific architectural extension of the EndoDAC framework for self-supervised monocular endoscopic depth estimation. Specifically, the proposed method adapts and integrates geometry- and illumination-aware feature enhancement, offline multi-generation self-distillation, and inference-stage structural fusion to improve feature representation, prediction refinement, and deployment efficiency under weak-texture and adverse illumination conditions. Experiments are conducted on the SCARED dataset for training and in-domain evaluation, while zero-shot cross-domain testing is performed on the Hamlyn dataset. Results The results show that the proposed EndoDAC-based extension improves depth estimation performance on the SCARED dataset and achieves lower error-oriented metrics than the baseline on Hamlyn zero-shot evaluation, with threshold accuracy remaining comparable to the baseline. Conclusions The proposed method demonstrates that task-specific adaptation of geometry- and illumination-aware feature enhancement, offline self-distillation, and inference-stage fusion can improve an EndoDAC-based self-supervised endoscopic depth estimation pipeline. These results support the effectiveness of the proposed architectural adaptation, while also indicating that further work is needed to improve cross-domain scale consistency and robustness under more challenging surgical conditions.
Wei et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: