Los puntos clave no están disponibles para este artículo en este momento.
We study a generalized setup for learning from demonstration to build an that can manipulate novel objects in unseen scenarios by looking at only single video of human demonstration from a third-person perspective. To this goal, our agent should not only learn to understand the intent the demonstrated third-person video in its context but also perform the task in its environment configuration. Our central insight is to this structure explicitly during learning by decoupling what to achieve (intended task) from how to perform it (controller). We propose a hierarchical where a high-level module learns to generate a series of first-person-goals conditioned on the third-person video demonstration, and a low-level predicts the actions to achieve those sub-goals. Our agent acts from image observations without any access to the full state information. We results on a real robotic platform using Baxter for the manipulation tasks pouring and placing objects in a box. Project video and code are at: //pathak22. github. io/hierarchical-imitation/
Sharma et al. (Thu,) studied this question.