In online second-hand marketplaces, multi-turn bargaining is a crucial part of seller-buyer interactions. Large Language Models (LLMs) can act as seller agents, negotiating with buyers on behalf of sellers under given business constraints. A critical ability for such agents is to track and accurately interpret cumulative buyer intents across long negotiations, which directly impacts bargaining effectiveness. We introduce a multi-turn evaluation framework for measuring the bargaining ability of seller agents in e-commerce dialogues. The framework tests whether an agent can extract and track buyer intents. Our contributions are: (1) a large-scale e-commerce bargaining benchmark spanning 622 categories, 9,892 products, and 3,014 tasks; (2) a turn-level evaluation framework grounded in Theory of Mind (ToM) with annotated buyer intents, moving beyond outcome-only metrics; and (3) an automated pipeline that extracts reliable intent from massive dialogue data.
Building similarity graph...
Analyzing shared references across papers
Loading...
Issue Yishu Wang
Kakam Chong
Xiaofeng Wang
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e02f3cf0e39f13e7fa26fa — DOI: https://doi.org/10.48550/arxiv.2509.06341
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: