Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning | Synapse