Sample mean based index policies by O(log n) regret for the multi-armed bandit problem | Synapse