Stochastic policy gradient reinforcement learning on a simple 3D biped | Synapse