January 1, 2015

Batch learning from logged bandit feedback through counterfactual risk minimization

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, rec...

Me gusta

Guardar

Cite This Study

SwaminathanAdith et al. (Thu,) studied this question.

synapsesocial.com/papers/6a1bc17700ee29383e9cd429 https://doi.org/https://doi.org/10.5555/2789272.2886805

Me gusta

Guardar