Locomotion is fundamental to the repertoire of skills required of physics-based human-like characters. Control policies are most commonly developed using reinforcement learning (RL) and using reward functions based on imitation of motion capture data. In this work, we propose an imitation-free RL training pipeline for bipedal locomotion controllers, as achieved using a multistage learning curriculum. Our work makes several contributions. First, it introduces a minimal set of additional specifications so that imitation-free RL can learn a single policy capable of in-place turning, side-stepping, hopping, and one-step foot plants, in addition to forwards and backwards walking. Second, the method offers precise and flexible conditioning, with control over footstep locations and further optional control over footstep timing, and footstep orientation. Third, we demonstrate that this imitation-free RL pipeline works across a range of body morphologies. Last, we show that the use of a plasticity-preservation technique allows for significantly faster learning. Our results demonstrate the scalability and effectiveness of using imitation-free RL approaches to develop flexible and highly-directable locomotion policies.
Matthé et al. (Fri,) studied this question.