Key points are not available for this paper at this time.
The mapping between 3D body poses and 2D shadows is fundamentally many-to-many and defeats regression methods, even with windowed context. We show how to learn a function between paths in the two systems, resolving ambiguities by integrating information over the entire length of a sequence. The basis of this function is a configural and dynamical manifold that summarizes the target system's behaviour. This manifold can be modeled from data with a hidden Markov model having special topological properties that we obtain via entropy minimization. Inference is then a matter of solving for the geodesic on the manifold that best explains the evidence in the cue sequence. We give a closed-form maximum a posteriori solution for geodesics through the learned density space, thereby obtaining optimal paths over the dynamical manifold. These methods give a completely general way to perform inference over time-series; in vision they support analysis, recognition, classification and synthesis of behaviours in linear time. We demonstrate with a prototype that infers 3D from monocular monochromatic sequences (e.g., back-subtractions), without using any articulatory body model. The framework readily accommodates multiple cameras and other sources of evidence such as optical flow or feature tracking.
Matthew Brand (Fri,) studied this question.