GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow | Synapse