What question did this study set out to answer?

The research aims to enhance negotiation dialogues by developing a self-play reinforcement learning framework.

May 13, 2026Open Access

α-Nego: Self-Play Deep Reinforcement Learning for Negotiation Dialogues

Key Points

The research aims to enhance negotiation dialogues by developing a self-play reinforcement learning framework.
Introduced a self-play reinforcement learning framework named α-Nego for training dialogue agents.
Integrated a holistic scoring approach combining utility and dialogue quality metrics.
Implemented a tiered criterion for selecting opponents based on utility dominance and holistic scores.
Enhanced policy evaluation using a value distribution to capture different negotiation strategies.
The α-Nego agent showed improved performance over state-of-the-art baseline agents.
Empirical evaluation demonstrated better negotiation outcomes using the Craigslistbargain and Dealornodeal datasets.
The scoring approach effectively balanced utility and dialogue quality metrics.

Abstract

Negotiation is a complicated process that requires skills like strategic reasoning and communication. Most research aims at training dialogue agents for negotiation tasks using a few fixed opponents, which causes the agents to be effective only for these opponents and limits their strategy styles and performance across varying opponents. To yield better and more comprehensive strategies, we propose a novel self-play reinforcement learning (RL) framework for negotiation dialogues, named α-Nego, which allows one to train an RL agent against continuously improving opponents. For training, we introduce a holistic scoring approach that integrates utility with dialogue quality metrics (Agreement, Length, Social welfare), and we implement a tiered criterion for pool admission of selected opponents: utility dominance is primary, with holistic score components serving as deterministic tie-breakers to ensure selection pressure reflects both task success and dialogue quality. Furthermore, α-Nego uses a value distribution to enhance the ability of policy evaluation. This enables different styles of negotiation strategies to capture different risk attitudes by incorporating different criteria with a value distribution. Empirical evaluation on the Craigslistbargain and Dealornodeal dataset shows that the α-Nego agent clearly outperforms the state-of-the-art baselines.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper