An offline actor-critic policy improvement algorithm with historical state-action pairs | Synapse