Lunar quadruped robots face landing challenges including weak gravity, large mass variations, uncertain sloped terrain, and strict payload acceleration limits, requiring effective impact mitigation and rapid post-landing stabilization. This paper presents a novel end-to-end reinforcement learning-based landing controller with three key novelties: (i) a phase-structured yet implicitly encoded formulation that distinguishes contact preparation, energy dissipation, and stabilization without explicit phase switching; (ii) a terrain-agnostic state and control representation using equivalent support direction construction and contact-gated modulation to decouple normal–tangential dynamics; and (iii) an extremum oriented learning strategy that directly captures peak impact suppression and buffering sufficiency, addressing limitations of cumulative rewards in hybrid, peak-dominated tasks. A hybrid control model for lunar quadruped landing dynamics is established, incorporating variable mass, low impact, and full stroke as key constraints during training. Simulation and full-scale experimental prototypes are built to validate the controller. Simulation results demonstrate robust landing buffering and stability control under varying mass, landing velocity, and slope conditions, with favorable robustness against parameter variations. Experimental verification is conducted under diverse conditions including different masses (200 kg, 250 kg), vertical/horizontal landing velocities (0.8 m/s, 0.2 m/s), and slopes (0, 8). The deviation between simulation and experimental results does not exceed 30%, confirming the effectiveness and transferability of the proposed approach.
Li et al. (Thu,) studied this question.