This work presents a hierarchical locomotion framework that integrates high-level, model-based step planning with reinforcement learning (RL). The Linear Inverted Pendulum (LIP) model is used to generate step timing and foot placement targets based on the robot's current state and commanded velocity. By providing only partial guidance from the analytical model, the RL policy benefits from the predictive structure of dynamics-based planning while retaining the flexibility to overcome the limitations of simplified modeling assumptions. Compared to end-to-end learned policies, our method achieves higher sample efficiency and better generalization. The proposed approach is validated on the bipedal robot Neubot, where the learned policy enables stable walking and robust disturbance rejection. Notably, it exhibits significantly improved resilience to external perturbations compared to a fixed-step learned policy.
Liu et al. (Wed,) studied this question.