Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors

Key Points

Key points are not available for this paper at this time.

Abstract

We present an algorithm for fusing multi-viewpoint video (MVV) with inertial measurement (IMU) sensor data to accurately estimate 3D human pose. A 3-D convolutional network is used to learn a pose embedding from volumetric probabilistic hull data (PVH) derived from the MVV frames. We incorporate this model within dual stream network integrating pose embeddings derived from MVV and a forward solve of the IMU data. A temporal model (LSTM) is incorporated within streams prior to their fusion. Hybrid pose inference using these two complementary sources is shown to resolve ambiguities within each sensor modality, yielding improved over prior methods. A further contribution of this work is a new hybrid dataset (TotalCapture) comprising video, IMU and a skeletal joint ground truth from a commercial motion capture system. The dataset is available online at: //cvssp. org/data/totalcapture/.

Mark Helpful

Bookmark

Relay

Mark Helpful

Bookmark

Relay

Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors

Key Points

Abstract

Cite This Study