May 27, 2024Open Access

Non-stochastic Bandits With Evolving Observations

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the observed loss is arbitrary and may not correlate with the true loss incurred, with each round updating previous observations adversarially. We propose regret minimization algorithms for both the full-information and bandit settings, with regret bounds quantified by the average feedback accuracy relative to the true loss. Our algorithms match the known regret bounds across many special cases, while also introducing previously unknown bounds.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Bar-On et al. (Mon,) studied this question.

synapsesocial.com/papers/68e68593b6db64358760de40 https://doi.org/https://doi.org/10.48550/arxiv.2405.16843

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Me gusta

Guardar

Ver artículo completo