Large-scale deep reinforcement learning method for energy management of power supply units considering regulation mileage payment | Synapse