What question did this study set out to answer?

The aim is to develop a robust locomotion control framework for quadrupedal wheel-legged robots without external sensors.

May 22, 2026Open Access

Robust Locomotion Control of Quadrupedal Wheel-Legged Robots via Contrastive History-Aware Reinforcement Learning in Complex Environments

Key Points

The aim is to develop a robust locomotion control framework for quadrupedal wheel-legged robots without external sensors.
Proposed a novel end-to-end reinforcement learning framework incorporating terrain and external force features.
Utilized a history of proprioceptive observations for kinematic response extraction and environmental representation.
Integrated a tailored composite reward function and progressive curriculum training with domain randomization.
Significantly reduced lateral linear velocity tracking error from 0.2421 m/s to 0.1319 m/s.
Achieved zero-shot sim-to-real transfer with enhanced sample efficiency.
Demonstrated highly agile and robust locomotion over diverse terrains during extensive validations.

Abstract

Quadrupedal wheel-legged robots possess exceptional mobility in complex terrains, but their robust locomotion control is severely hindered by the difficulty of accurate state estimation without external sensors. Existing reinforcement learning methods relying on two-stage imitation often suffer from representation collapse and information loss during sim-to-real transfer. To address these challenges, this paper proposes a novel end-to-end reinforcement learning framework for implicit state estimation, incorporating terrain and external force features. Inspired by internal model control, the proposed method leverages a history of purely proprioceptive observations to extract explicit kinematic responses, as well as implicit environmental and external force representations via prototypical contrastive learning, completely circumventing explicit terrain regression and the need for physical force sensors. Furthermore, a tailored composite reward function and a progressive curriculum training strategy with large-scale domain randomization are integrated to ensure dynamic stability and hardware safety. Extensive cross-simulator validations and real-world deployments demonstrate that the approach achieves highly agile and robust locomotion, including adaptive traversal over diverse terrains. Experiments show that the method significantly enhances robustness under external disturbances, notably reducing the lateral linear velocity tracking error from 0.2421 m/s to 0.1319 m/s. The proposed method realizes zero-shot sim-to-real transfer with superior sample efficiency, providing a reliable and universal control paradigm for wheel-legged robots in unstructured environments.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper