Controlling underestimation bias in reinforcement learning via minmax operation | Synapse