April 10, 2024

Stabilizing Diffusion Model for Robotic Control With Dynamic Programming and Transition Feasibility

Key Points

Key points are not available for this paper at this time.

Abstract

Due to its strong ability in distribution representation, the diffusion model has been incorporated into offline reinforcement learning (RL) to cover diverse trajectories of the complex behavior policy. However, this also causes several challenges. Training the diffusion model to imitate behavior from the collected trajectories suffers from limited stitching capability which derives better policies from suboptimal trajectories. Furthermore, the inherent randomness of the diffusion model can lead to unpredictable control and dangerous behavior for the robot. To address these concerns, we propose the Value-learning-based Decision Diffuser(V-DD), which consists of the trajectory diffusion module(TDM) and the trajectory evaluation module(TEM). During the training process, the TDM combines the state-value and classifier-free guidance to bolster the ability to stitch suboptimal trajectories. During the inference process, we design the TEM to select a feasible trajectory generated by the diffusion model. Empirical results demonstrate that our method delivers competitive results on the D4RL benchmark and substantially outperforms current diffusion model-based methods on the real-world robot task.

AI से पूछें

Bookmark

AI से पूछें

Bookmark

Stabilizing Diffusion Model for Robotic Control With Dynamic Programming and Transition Feasibility

Key Points

Abstract

Cite This Study