Los puntos clave no están disponibles para este artículo en este momento.
We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, rec...
SwaminathanAdith et al. (Thu,) studied this question.