Los puntos clave no están disponibles para este artículo en este momento.
Reinforcement learning approaches are increasingly used to model complex decision-based problems. The multi-armed bandit problem is a classical instance suitable for reinforcement learning challenges that involves balancing exploration and exploitation trade-offs. Finding a balance between exploration and exploitation is a fundamental aspect of a variety of reinforcement learning applications. Multi-armed bandit algorithms are useful in multiple industry domains such as computer games, clinical trials, telecommunication, and recommender systems. This paper aims to study the multi-armed bandit problem and contextualize the algorithms to provide a framework for optimizing click-through rates in online advertising, thereby improving the customer fidelity. To that end, parameterized bandit algorithms such as upper confidence bound (UCB), epsilon greedy (є-greedy), and SoftMax algorithms were implemented and tweaked to maximize performance in an advertising platform. The results obtained demonstrate optimal records in choosing the best adverts. The UCB approach achieves the highest cumulative mean rewards for selecting the arms over the iterations. Experiments stipulate that the proposed system outperforms the conventional techniques when є and τ are set to 0.1 as it does not rely on the availability of the data over varying cycles.
Mambou et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: