On-robot Reinforcement Learning is a promising approach to train embodiment-aware policies for legged robots. However, the computational constraints of real-time learning on robots pose a significant challenge. We present a framework for efficiently learning quadruped locomotion in just 8 minutes of raw real-time training utilizing the sample efficiency and minimal computational overhead of the new off-policy algorithm CrossQ. We investigate two control architectures: Predicting joint target positions for agile, high-speed locomotion and Central Pattern Generators for stable, natural gaits. While prior work focused on learning simple forward gaits, our framework extends on-robot learning to omnidirectional locomotion. We demonstrate the robustness of our approach in different indoor and outdoor environments.
Building similarity graph...
Analyzing shared references across papers
Loading...
Nico Bohlinger
Jonathan Kinzel
Daniel Palenicek
Building similarity graph...
Analyzing shared references across papers
Loading...
Bohlinger et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68da58c9c1728099cfd10a39 — DOI: https://doi.org/10.48550/arxiv.2503.08375