Event-based monocular depth estimation is crucial for applications such as autonomous driving, obstacle avoidance, and navigation under high-speed scenarios. Events exhibit a unique and irregular modality. To adapt them to neural networks, some studies convert event streams into event voxels or other frame-like representations. However, these approaches tend to lose the temporal characteristics of events. In this study, we propose a network that aggregates global voxel and per-channel temporal local features of event voxels across the temporal dimension, explicitly extracting events’ temporal information. Furthermore, as noise in events can interfere with the training process and is more difficult to predict than that in images, we utilize the uncertainty estimation module to mitigate the impact of uncertain factors and enhance the robustness of the model. Additionally, we employ multi-level depth features for supervisory training, which improves prediction performance compared to methods relying solely on ground-truth depth supervision. Experiments on open-source datasets demonstrate the effectiveness of the proposed method. Our code can be found at https://github.com/WuShangjie/GLUNET.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shangjie Wu
Jihua Zhu
Zhikuan Zhou
ACM Transactions on Multimedia Computing Communications and Applications
Tsinghua University
Xi'an Jiaotong University
National Intelligence University
Building similarity graph...
Analyzing shared references across papers
Loading...
Wu et al. (Sat,) studied this question.
www.synapsesocial.com/papers/69c9c57ff8fdd13afe0bd74d — DOI: https://doi.org/10.1145/3803017