This paper presents an innovative approach to overcoming the barrier of transferring reinforcement learning policies between different physical simulators (Sim2Sim). We propose the Action Correction Network (ACN) architecture, a two-component neural network that corrects policy actions taking into account differences in simulator dynamics. The effectiveness of the method is experimentally demonstrated using the example of transferring the walking policy for the Unitree A1 quadruped robot between the PyBullet and MuJoCo simulators.
Geroyev et al. (Tue,) studied this question.