August 17, 2025

Rate-Optimal Online Learning for Dynamic Assortment Selection with Positioning

Key Points

Optimal learning efficiency is achieved through the TLR-UCB algorithm, allowing enhanced revenue generation.
Using a multinomial logit model shows that product display position impacts customer choices significantly.
The proposed EI-TLR policy estimates customer preferences while addressing unknown positioning effects effectively.
Significant improvements are seen in simulations of both the TLR-UCB and EI-TLR compared to traditional methods.

Abstract

This study addresses a key challenge in online retail: product positioning. The authors propose a novel online learning framework called dynamic assortment selection with positioning (DAP). Unlike traditional models that focus solely on item selection, DAP also learns optimal product placement to maximize revenue. The researchers model customer choices using a multinomial logit framework, where item appeal depends on both intrinsic preference and display position. They demonstrate that ignoring position effects leads to suboptimal performance and introduce a new algorithm, TLR-UCB, which effectively incorporates adaptive position-dependent feedback through a geometric linear bandit structure and truncated linear regression techniques. Theoretical analysis confirms that TLR-UCB achieves optimal learning efficiency. To handle unknown position effects, they further develop EI-TLR, a two-stage policy that jointly estimates customer preferences and positioning impacts before applying a generalized TLR-UCB procedure. Extensive simulations show that both TLR-UCB and EI-TLR significantly outperform existing benchmarks, offering powerful tools for dynamic, data-driven assortment and layout optimization in online marketplaces.

Bookmark

Rate-Optimal Online Learning for Dynamic Assortment Selection with Positioning

Key Points

Abstract

Cite This Study

Also Consider

Also Consider