Reinforcement Q-learning optimal control of 2D discrete-time systems with unknown dynamics | Synapse