Batch learning from logged bandit feedback through counterfactual risk minimization | Synapse