Los puntos clave no están disponibles para este artículo en este momento.
Synthetic visual data can provide practically infinite diversity and rich, while avoiding ethical issues with privacy and bias. However, for many, current models trained on synthetic data generalize poorly to real data. task of 3D human pose estimation is a particularly interesting example of sim2real problem, because learning-based approaches perform reasonably given real training data, yet labeled 3D poses are extremely difficult to in the wild, limiting scalability. In this paper, we show that standard-network approaches, which perform poorly when trained on synthetic RGB, can perform well when the data is pre-processed to extract cues about person's motion, notably as optical flow and the motion of 2D keypoints. , our results suggest that motion can be a simple way to bridge a2real gap when video is available. We evaluate on the 3D Poses in the Wild, the most challenging modern benchmark for 3D pose estimation, where we full 3D mesh recovery that is on par with state-of-the-art methods trained real 3D sequences, despite training only on synthetic humans from the dataset.
Doersch et al. (Thu,) studied this question.