Camera-based 3D occupancy prediction commonly relies on bird’s-eye-view (BEV) representations, yet two limitations remain: optimization instability when inserting new modules into pre-trained BEV encoders, and height-agnostic BEV-to-voxel lifting that fails to preserve elevation-aware scene structure. We propose GSH-Occ (Gradient-Shielded and Height-Aware BEV Occupancy Network), a framework that tackles both issues through two complementary mechanisms. Gradient-Shielded Residual Dual Attention (GS-RDA) introduces a zero-initialized residual gate that preserves the identity mapping at initialization, allowing new attention modules to be grafted onto pre-trained encoders without disturbing learned features. Height-Aware Adaptive Lift (HAL) replaces naive channel replication with per-voxel adaptive fusion of BEV features and learnable height embeddings, followed by 3D convolutional refinement to capture vertical structure. On the Occ3D-nuScenes validation benchmark, GSH-Occ achieves 46.92 mIoU, outperforming FlashOcc by +3.40 mIoU. Ablation studies confirm that GS-RDA and HAL target distinct failure modes and yield complementary improvements.
Building similarity graph...
Analyzing shared references across papers
Loading...
Bokai Ou
Tianhui LI
Zhigui Lin
Sensors
Tsinghua University
Xiamen University
Xiamen University of Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Ou et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69faa2b504f884e66b533489 — DOI: https://doi.org/10.3390/s26092800
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: