Negotiation is a complicated process that requires skills like strategic reasoning and communication. Most research aims at training dialogue agents for negotiation tasks using a few fixed opponents, which causes the agents to be effective only for these opponents and limits their strategy styles and performance across varying opponents. To yield better and more comprehensive strategies, we propose a novel self-play reinforcement learning (RL) framework for negotiation dialogues, named α-Nego, which allows one to train an RL agent against continuously improving opponents. For training, we introduce a holistic scoring approach that integrates utility with dialogue quality metrics (Agreement, Length, Social welfare), and we implement a tiered criterion for pool admission of selected opponents: utility dominance is primary, with holistic score components serving as deterministic tie-breakers to ensure selection pressure reflects both task success and dialogue quality. Furthermore, α-Nego uses a value distribution to enhance the ability of policy evaluation. This enables different styles of negotiation strategies to capture different risk attitudes by incorporating different criteria with a value distribution. Empirical evaluation on the Craigslistbargain and Dealornodeal dataset shows that the α-Nego agent clearly outperforms the state-of-the-art baselines.
Chen et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: