Object Goal Navigation (ObjectNav) in novel environments relies on comprehensive scene understanding, including precise visual perception and accurate modeling of spatial-semantic regularities. However, excessive attention to the hand-crafted scene representation in prevailing approaches leads to the neglect of the negative influence of the perception bias hidden in the visual observations. The hand-crafted semantic distribution in domestic environments causes the spurious association bias, while the semantic conflict bias arises due to the dynamic perspective changes. Biased visual perception significantly limits the generalization of the navigation strategy. In this paper, we propose the U nbiased E mbodied V isual R epresentation( UEVR ), which overcomes the perception biases using causal inference and cross-modality alignment. Specifically, we establish reasonable assumptions about confounders for multi-object features through our proposed Unbiased Causal R-CNN framework and eliminate the spurious associations bias through B ack-door I ntervention C ausal A djustment( BICA ) module during navigation. To overcome the dynamic-view bias hidden in 2D image features, we propose to employ the cross-modality alignment mechanism with the Geo metric Con straints( GeoCon ) to encode 3D geometry prior into the 2D representations. Finally, we design a modular ObjectNav framework integrated with UEVR named Causal-ObjectNav , which consists of the corner-based scene exploration module and target object discrimination module. Extensive experiments on the MP3D and HM3D datasets demonstrate the superiority of the unbiased navigation model over existing ObjectNav methods.
Kang et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: