Key points are not available for this paper at this time.
Understanding spatial relations (e. g. , "laptop on table") in visual input is for both humans and robots. Existing datasets are insufficient as lack large-scale, high-quality 3D ground truth information, which is for learning spatial relations. In this paper, we fill this gap by Rel3D: the first large-scale, human-annotated dataset for spatial relations in 3D. Rel3D enables quantifying the effectiveness 3D information in predicting spatial relations on large-scale human data. , we propose minimally contrastive data collection -- a novel method for reducing dataset bias. The 3D scenes in our dataset in minimally contrastive pairs: two scenes in a pair are almost identical, a spatial relation holds in one and fails in the other. We empirically that minimally contrastive examples can diagnose issues with current detection models as well as lead to sample-efficient training. Code data are available at https: //github. com/princeton-vl/Rel3D.
Goyal et al. (Wed,) studied this question.