In recent times, geometric foundation models have demonstrated remarkable performance in depth estimation tasks, benefiting from exposure to large-scale data that enables the learning of intricate geometric structures and spatial dependencies. However, their large parameter sizes and high computational complexity pose significant challenges in meeting the efficiency requirements of downstream surgical applications. Consequently, the design of a high-performance yet lightweight monocular depth estimator has become a focal point of research. To this end, we harness the rich geometric priors encoded in geometric foundation models and introduce a novel trinity distillation scheme that transfers geometric knowledge across three complementary dimensions, namely spatial, spectral and gradient, into a compact depth estimator. To further enhance prediction quality, we develop a semantic distribution alignment strategy to effectively suppress pseudo-texture artifacts arising from the limited semantic representation capability of the lightweight estimator. Extensive experiments on the SCARED, SERV-CT, Hamlyn, and C3VD datasets demonstrate that the proposed method either surpasses or achieves comparable performance to previous state-of-the-art competitors, with a smaller model size and reduced computational overhead. Code will be available at: https://github.com/ShuweiShao/LiteNet.
Zhu et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: