Depth Prior-Guided 3D Voxel Feature Fusion for 3D Semantic Estimation from Monocular Videos | Synapse