Multilayer Action Correction Perceptron for Overcoming the Barrier between Simulation and the Real World when Training Quadrupedal Robot Policies

Key Points

The method improves policy actions in different simulators, enhancing adaptability between virtual environments.
The Action Correction Network reduces the discrepancy in simulator dynamics, showcasing improved robot locomotion.
Experimental validation includes transferring a walking policy across PyBullet and MuJoCo simulators, revealing strong adaptability.
Implications of this work highlight its potential for real-world quadrupedal performance, though real-world testing remains needed.

Abstract

This paper presents an innovative approach to overcoming the barrier of transferring reinforcement learning policies between different physical simulators (Sim2Sim). We propose the Action Correction Network (ACN) architecture, a two-component neural network that corrects policy actions taking into account differences in simulator dynamics. The effectiveness of the method is experimentally demonstrated using the example of transferring the walking policy for the Unitree A1 quadruped robot between the PyBullet and MuJoCo simulators.

Bookmark

Cite This Study

Geroyev et al. (Tue,) studied this question.

synapsesocial.com/papers/69a75d12c6e9836116a2681d https://doi.org/https://doi.org/10.1134/s1064226925700433

Bookmark