July 26, 2024Open Access

Optimizing dialog policy with large action spaces using deep reinforcement learning

Key Points

Key points are not available for this paper at this time.

Abstract

Dialogue policy is responsible to select the next appropriate action from the current dialogue state to accomplish the user goal efficiently. Present commercial task-oriented dialogue systems are mostly rule-based; thus, they are not easily scalable to adapt multiple domains. To design an adaptive dialogue policy, user feedback is an essential parameter. Recently, deep reinforcement learning algorithms have been popularly applied to such problems. However, managing large state-action space is time consuming and computationally expensive. Additionally, it requires good quality and a reliable user simulator to train the dialogue policy which takes additional design efforts. In this paper, we propose a novel approach to improve the performance of dialogue policy by accelerating the training process by using imitation learning for deep reinforcement learning. We utilized proximal policy optimization (PPO) algorithm to model dialogue policy using a large-scale multi-domain tourist dataset MultiWOZ2.1. We observed a remarkable performance of dialogue policy with 91.8% task success rate, and an approximate 50% decrease in the average number of turns required to complete tasks without using user simulator in the early phase of training cycles. This approach is expected to help researchers to design computationally efficient and scalable dialogue agents by avoiding training from scratch.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Thakkar et al. (Fri,) studied this question.

synapsesocial.com/papers/68e5ef80b6db643587583e92 https://doi.org/https://doi.org/10.11591/ijeecs.v36.i1.pp428-440

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper