June 25, 2024Open Access

Boosting Soft Q-Learning by Bounding

Key Points

Key points are not available for this paper at this time.

Abstract

An agent's ability to leverage past experience is critical for efficiently solving new tasks. Prior work has focused on using value function estimates to obtain zero-shot approximations for solutions to a new task. In soft Q-learning, we show how any value function estimate can also be used to derive double-sided bounds on the optimal value function. The derived bounds lead to new approaches for boosting training performance which we validate experimentally. Notably, we find that the proposed framework suggests an alternative method for updating the Q-function, leading to boosted performance.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Adamczyk et al. (Tue,) studied this question.

synapsesocial.com/papers/68e636c5b6db6435875c8d8d https://doi.org/https://doi.org/10.48550/arxiv.2406.18033

Bookmark

View Full Paper