Adaptive discount factor for accelerating policy learning considering long-term returns in reinforcement learning with non-stationary environments | Synapse