Key points are not available for this paper at this time.
Due to its strong ability in distribution representation, the diffusion model has been incorporated into offline reinforcement learning (RL) to cover diverse trajectories of the complex behavior policy. However, this also causes several challenges. Training the diffusion model to imitate behavior from the collected trajectories suffers from limited stitching capability which derives better policies from suboptimal trajectories. Furthermore, the inherent randomness of the diffusion model can lead to unpredictable control and dangerous behavior for the robot. To address these concerns, we propose the Value-learning-based Decision Diffuser(V-DD), which consists of the trajectory diffusion module(TDM) and the trajectory evaluation module(TEM). During the training process, the TDM combines the state-value and classifier-free guidance to bolster the ability to stitch suboptimal trajectories. During the inference process, we design the TEM to select a feasible trajectory generated by the diffusion model. Empirical results demonstrate that our method delivers competitive results on the D4RL benchmark and substantially outperforms current diffusion model-based methods on the real-world robot task.
Building similarity graph...
Analyzing shared references across papers
Loading...
Haoran Li
Yaocheng Zhang
Haowei Wen
IEEE Transactions on Artificial Intelligence
Chinese Academy of Sciences
University of Chinese Academy of Sciences
Institute of Automation
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68e6fa8ab6db643587674d59 — DOI: https://doi.org/10.1109/tai.2024.3387401