Key points are not available for this paper at this time.
We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, rec...
SwaminathanAdith et al. (Thu,) studied this question.