What question did this study set out to answer?

This research aims to enhance monocular depth estimation accuracy in endoscopic environments affected by poor texture and lighting conditions.

July 1, 2026Open Access

MonoGID: geometry and illumination aware enhancement with distillation for self-supervised monocular endoscopic depth estimation

Key Points

This research aims to enhance monocular depth estimation accuracy in endoscopic environments affected by poor texture and lighting conditions.
Proposed a task-specific extension of the EndoDAC framework for depth estimation.
Integrated geometry- and illumination-aware feature enhancement and offline multi-generation self-distillation.
Conducted experiments on the SCARED dataset for in-domain training and evaluation, with zero-shot cross-domain testing on the Hamlyn dataset.
EndoDAC-based extension improved depth estimation performance on the SCARED dataset.
Achieved lower error-oriented metrics on Hamlyn dataset zero-shot evaluation compared to the baseline.
Threshold accuracy remained comparable to the baseline.

Abstract

Abstract Background Monocular depth estimation in endoscopic scenes is a fundamental prerequisite for intraoperative 3D reconstruction and surgical navigation. However, factors such as weak textures, specular reflections, local overexposure, and dynamic illumination changes in endoscopic environments can undermine the effectiveness of photometric-consistency-based self-supervised learning methods, leading to insufficient structural representation and limited prediction accuracy. Methods To address this issue, this paper presents a task-specific architectural extension of the EndoDAC framework for self-supervised monocular endoscopic depth estimation. Specifically, the proposed method adapts and integrates geometry- and illumination-aware feature enhancement, offline multi-generation self-distillation, and inference-stage structural fusion to improve feature representation, prediction refinement, and deployment efficiency under weak-texture and adverse illumination conditions. Experiments are conducted on the SCARED dataset for training and in-domain evaluation, while zero-shot cross-domain testing is performed on the Hamlyn dataset. Results The results show that the proposed EndoDAC-based extension improves depth estimation performance on the SCARED dataset and achieves lower error-oriented metrics than the baseline on Hamlyn zero-shot evaluation, with threshold accuracy remaining comparable to the baseline. Conclusions The proposed method demonstrates that task-specific adaptation of geometry- and illumination-aware feature enhancement, offline self-distillation, and inference-stage fusion can improve an EndoDAC-based self-supervised endoscopic depth estimation pipeline. These results support the effectiveness of the proposed architectural adaptation, while also indicating that further work is needed to improve cross-domain scale consistency and robustness under more challenging surgical conditions.

AI에게 질문

Bookmark

View Full Paper