Relative Q-Learning for Average-Reward Markov Decision Processes With Continuous States | Synapse