January 1, 2015

Batch learning from logged bandit feedback through counterfactual risk minimization

Key Points

Key points are not available for this paper at this time.

Abstract

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, rec...

Bookmark

Cite This Study

SwaminathanAdith et al. (Thu,) studied this question.

synapsesocial.com/papers/6a1bc17700ee29383e9cd429 https://doi.org/https://doi.org/10.5555/2789272.2886805

Bookmark