Q-learning as a monotone scheme | Synapse